How Anthropic’s Claude Opus 4.5 Beats The Best Human Coders

Share this article
Share this article
Prioritise Us on Google
Claude Opus 4.5, the company says is currently the best model in the world for coding, agents and computer use | Credit: Anthropic
Anthropic releases Claude Opus 4.5, its most robustly-aligned, safest and token-efficient AI model with advanced coding and agentic AI capabilities

There’s a tech storm ahead. 

At the centre of it, is Anthropic’s latest intelligent AI model: Claude Opus 4.5.

Faster, token-efficient and capable of handling a plethora of every day tasks and tools, Claude Opus 4.5, the company says is currently the best model in the world for coding, agents and computer use. 

In a notoriously difficult engineering test Anthropic gave to prospective engineering candidates, Opus 4.5 did not just win the trophy, but stole the whole show scoring better than any human candidate ever.

Claude Opus 4.5 significantly outperforms prominent AI models in software engineering tasks | Credit: Anthropic

Opus 4.5 has a diverse resume that doesn’t just cover impressive software engineering, but has a knack for mathematical reasoning and creative problem solving as well. 

The model is also very good at managing a team of sub-agents, thereby enabling complex and effective coordination between multi-agent systems.  

When solving, exhaustive, long-running tasks, this multi-agentic approach can help achieve user objectives quicker, with improved token-efficiency and higher precision. 

Opus 4.5 has the ability to find clever paths around problems as demonstrated by a test scenario where it was asked to help a distressed customer while working as an airline service agent.

The model displays creative problem solving ability by finding a loophole to help the customer. Although the benchmark flagged this as a failure as the model did not technically perform as intended, this scenario portrays Claude Opus 4.5's capability to find innovative solutions to real world problems | Credit: Anthropic

How safe is Claude Opus 4.5?

AI model’s ability to think outside the box and paint clever solutions to problems by working around the rules, could in certain contexts be considered as ‘reward hacking’, term for when AI models ‘game the rules’ to get rewards. 

This raises serious questions on the AI alignment problem, but Anthropic says that Claude Opus 4.5 is the most robustly aligned model they have released to date.

Anthropic says that the Claude Opus 4.5 is the most robustly aligned model they have released to date | Credit: Anthropic

Opus 4.5 shows low scores for concerning behaviour which includes both undesirable actions taken by the model itself as well as co-operation with human misuse. 

The model also shows resilience against prompt injection attacks superseding most other prominent AI models. 

Inside Anthropic’s Claude for Chrome and Claude for Excel

Claude for Chrome is a model that has the ability to handle tasks across numerous browser tabs. 

From clicking buttons, filling forms and summarising meetings, Claude for Chrome runs in the background, autonomously performing everyday tasks users require with just a simple prompt.

Leveraging Claude Opus 4.5’s performance, Claude for Excel lets users easily work with complex spreadsheets. 

Youtube Placeholder
Turning Claude into your thinking partner

The Opus 4.5 model also brings upgrades to the Claude Developer Platform, by improving Claude Code’s performance. 

This is achieved primarily by asking clarifying questions and creating a plan of execution, which the user can then review and edit, before the model starts execution. 

Claude Code can hence give results that are tailored to the user’s expectations without having to go through tiring iterations of improvement. 

Rahul Patil, Chief Technology Officer (CTO) at Anthropic says that he is excited to see what developers build next. 

Rahul Patil, CTO at Anthropic says that Claude Opus 4.5 can bring 20% accuracy improvements in financial modelling

“GitHub, Cursor, Replit and Windsurf are already integrating it,” he says.

“Claude Opus 4.5 is powerful enough for Rakuten's agents to autonomously refine themselves in 4 iterations and precise enough for 20% accuracy improvements in financial modelling.

“The model is also much better at frontend design. Our platform now supports longer-running agents, Claude Code is available on desktop (research preview) and we’re also launching an updated Plan Mode.”

Mario Rodriguez, CPO at GitHub

Mario Rodriguez, Chief Product Officer (CPO) at GitHub says: “Claude Opius 4.5 delivers high-quality code and excels at powering heavy-duty agentic workflows with GitHub Copilot.

“Early testing shows it surpasses internal coding benchmarks while cutting token usage in half – and is especially well-suited for tasks like code migration and code refactoring.”

Company portals

Executives