Article

AI Applications

How Claude’s Sonnet 4.5 Sets a New AI Coding Standard

By Kitty Wheeler

September 30, 2025

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

Dario Amodei, CEO of Anthropic, launches Claude Sonnet 4.5

Anthropic’s Claude Sonnet 4.5 sets a new benchmark by excelling in coding, multi-hour task execution and developer tools with impressive benchmark results

Software development is becoming a primary battleground for AI companies as models gain capabilities to write code and operate computers autonomously.

Now, Anthropic has released Claude Sonnet 4.5, a model designed for software development and computer operation tasks.

The model launches alongside product updates including checkpoints in Claude Code, the company’s command-line coding tool and expanded capabilities in its consumer applications.

The release also includes the Claude Agent SDK, infrastructure that Anthropic uses to build Claude Code.

The capabilities of Claude Sonnet 4.5

The toolkit provides developers with systems for managing memory across tasks, handling permission controls and coordinating multiple agents.

“Claude Sonnet 4.5 is our most powerful model to date.”
Anthropic

Anthropic has made this infrastructure available to developers building their own agent systems.

“Code is everywhere. It runs every application, spreadsheet and software tool you use,” Anthropic says.

“Being able to use those tools and reason through hard problems is how modern work gets done.”

Claude Sonnet 4.5 maintains pricing at $3 per million input tokens and $15 per million output tokens, matching the rate for Claude Sonnet 4.

The model is available through the Claude API using the identifier claude-sonnet-4-5.

Claude Sonnet 4.5’s scoring and industry impact

The model achieves a score of 61.4% on OSWorld, a benchmark that tests AI systems on computer tasks. This compares to 42.2% for Claude Sonnet 4, released four months earlier.

Claude Sonnet 4.5’s results | Credit: Anthropic

Additionally, on SWE-bench Verified, an evaluation measuring software coding abilities, Claude Sonnet 4.5 leads among tested models.

Anthropic reports the model maintains focus for over thirty hours on tasks involving multiple steps.

The company has also deployed these capabilities in its Chrome extension, which became available to Max subscribers who joined a waitlist in August.

Early customers have reported results from deployments.

Mario Rodriguez, CPO at GitHub

Mario Rodriguez, Chief Product Officer (CPO) at GitHub says: “Claude Sonnet 4.5 amplifies GitHub Copilot’s core strengths.

“Our initial evals show significant comprehension – enabling Copilot’s agentic experiences to handle complex, codebased-spanning tasks better.”

Eric Wendelin, Tech Lead, Gen AI for Developer Productivity at Netflix

Eric Wendelin, Tech Lead, Gen AI for Developer Productivity at Netflix adds: “Claude Sonnet 4.5 is excellent at software development tasks, learning our codebased patterns to deliver precise implementations.

“It handles everything from debugging to architecture with deep contextual understanding, transforming our development velocity.”

Additionally, experts in finance, law, medicine and STEM find that Sonnet 4.5 shows dramatically better domain-specific knowledge and reasoning compared to older models, including Opus 4.1.

Finance experts findings on Sonnet 4.5 | Credit: Anthropic

The matter of safety measures

Anthropic is releasing Claude Sonnet 4.5 under its AI Safety Level 3 protections, a framework that matches model capabilities with corresponding safeguards.

The company has implemented classifiers, which are filters designed to detect inputs and outputs related to chemical, biological, radiological and nuclear weapons.

These classifiers can flag content that does not pose risks. As a result, Anthropic has reduced these incorrect flags by a factor of 10 since initially describing them – and by a factor of two since releasing Claude Opus 4 in May.

The model also demonstrates reductions in behaviours including sycophancy, deception, power-seeking and encouragement of delusional thinking, according to Anthropic’s automated behavioural assessments.

Anthropic says Claude Sonnet 4.5 is “our most aligned frontier model yet,” adding that “Claude’s improved capabilities and our extensive safety training have allowed us to substantially improve the model’s behavior.”

Claude Sonnet 4.5’s misaligned behaviour scores | Credit: Anthropic

So far, Anthropic has introduced code execution and file creation capabilities directly into conversations in its consumer applications.

Users can now generate spreadsheets, presentations and documents within the chat interface.

The Claude API includes a context editing feature and memory tool designed to enable agents to operate for extended periods.

Furthermore, Anthropic is offering a research preview called Imagine with Claude, available to Max subscribers for five days.

The preview demonstrates Claude Sonnet 4.5 generating software in response to user requests without predetermined functionality or prewritten code.

Anthropic says it “built Claude Code because the tool we wanted didn’t exist yet.

“The Agent SDK gives you the same foundation to build something just as capable for whatever problem you’re solving.”

Company portals

Netflix

How Claude’s Sonnet 4.5 Sets a New AI Coding Standard

The capabilities of Claude Sonnet 4.5

Claude Sonnet 4.5’s scoring and industry impact

The matter of safety measures

Company portals

Netflix

Tags