Anthropic Warns Autonomous AI Risks Loss of Human Control
When the largest pureplay AI company in the world wants systems to pause global AI development, it is likely because the cost of getting this wrong is cataclysmic for humanity.
The request is contrary to the economic interest of Anthropic, which highlights the severity of the situation.
The root of this growing concern is what is called recursive self improvement, which is AI developing and designing its future self.
In a blog post titled When AI Builds Itself, Anthropic insists that the stage of autonomous self improvement could arrive much sooner than people think.
“Taken far enough and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor,” the company says.
A closing coding loop
An engineer at Anthropic today ships eight times more code in a quarter than they did back in 2021.
The lead up to this was a trajectory that started much like the workflow of any regular company, with skilled developers coding the old way until chatbots arrived.
These intelligent natural language systems made it easier to translate problems to code snippets, which could then be input to integrated development environments.
Coding agents broke through this and removed the human middleman, writing and editing their own code and modifying entire files on their own.
However, autonomous agents took things to the next level by pushing the already scarce human input to a more creative direction.
It is sufficient to say that a system this precise at code will also be immaculate in finding new ways to improve itself.
This development will eventually close the circle, leaving no loops for humans to jump through.
AI competency gains accelerate
Every four months, AI models double the length of tasks they can reliably complete on their own.
This shift in capacity is striking as the trend previously showed this change to happen every seven months.
Taking Claude Opus 3 back in 2024 as a yardstick, this model could complete, on its own, a task that takes a human about four minutes.
Claude 3.7 a year later could manage tasks about one and a half hours long.
Yet another year later, Claude Opus 4.6 could work on 12-hour tasks on its own, pointing to a coming 2027 where models flawlessly execute a week of human work.
Benchmarks tell the same story across the technology industry:
- Models are saturating SWE-bench, which is a standardised test for real-world software projects
- Systems are asked to fix issues after being handed an open-source codebase and a bug report
- Performance today is a major upgrade from two years ago when models scored in the low single digits
- Similar results are occurring on CORE-Bench, which tests whether AI can do its own research.
In long-duration coding tasks measured by METR, Claude Mythos Preview emerged at the upper end of what the organisation can measure without new tasks.
Anthropic data from May 2026 shows that 80% of the code merged into the codebase of the company was written by Claude, with significant quality improvements.
As the AI giant puts it: “Claude writes code that works.”
Frontier systems risk total misalignment
Marina Favaro, Lead at the Anthropic Institute, co-authored the blog post and shared insights on LinkedIn.
She notes that recursive self-improvement is not here, nor is it inevitable.
She says: “But if these trends continue, AI systems designing and building their own successors seems plausible.
“In that scenario, we expect that the pace of AI development will accelerate. This has the potential to bring enormous good to the world but it also creates loss of control risks.”
The future contains multiple possibilities, though Anthropic narrows them down into three outcomes:
- The AI capability stalls
- Efficiency compounds but faces a different bottleneck
- AI designs itself with fully recursive self improvement.
Scenario three pictures the human role as diminished to mere oversight and validation of what is being cooked in the virtual lab by the AI systems.
The elephant in this room is the alignment problem, which Anthropic says it is least certain about.
Models may turn out to be aligned enough and may be sufficiently wise to halt production if things are not in alignment. But misalignment persists in today's systems, however rare it is.
This could compound as misaligned AI redesigns itself until it becomes less understood and humanity loses control, which could also happen in physical AI systems and robotics.
Without a global coordination mechanism, companies and governments will have to make difficult decisions about safety while under competitive and geopolitical pressures.
Anthropic notes that it believes it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology.



