Anthropic Uncovers Landmark AI-Led Cyberattack Campaign

Anthropic has revealed that Chinese state-sponsored hackers have leveraged its AI to carry out what it describes as the first known large-scale cyberespionage operation driven primarily by artificial intelligence.
This marks the beginning of a new phase in cyber conflict, where AI-enabled systems can collect intelligence and execute attacks with minimal human oversight.
The breach was detected in mid-September 2025 and exploited the autonomous functions of the AI model Claude Code to compromise approximately thirty high-value global entities, including technology firms, financial institutions, chemical producers and government bodies.
AI systems carried out nearly 80% to 90% of the intrusion processes autonomously, with human operators intervening only for key strategic decisions. This underscores a major turning point in the evolution of large-scale cybersecurity threats.
AI’s autonomous cyber offensive
In a 13-page report outlining the mechanics of the breach, Anthropic revealed that the campaign leveraged recent breakthroughs in AI – specifically intelligence, agency and tool integration – to orchestrate a multi-stage cyberattack with unprecedented autonomy.
Unlike prior operations that depend heavily on human command, this incident employed Claude Code not merely as an advisory system but as an active agent executing advanced hacking sequences.
Human operators initiated the campaign by defining targets and strategic objectives, while the AI independently managed reconnaissance, vulnerability scanning, exploit creation, credential theft, lateral movement and data extraction.
So how did the threat actors automate the assault?
By evading Claude Code’s built-in safeguards – fragmenting malicious instructions into seemingly benign tasks – the group deceived the system into perceiving its actions as part of a sanctioned cybersecurity assessment.
As a result, Claude carried out thousands of operations per second, a pace no human team could feasibly replicate.
Speaking to WSJ, Anthropic’s Head of Threat Intelligence Jacob Klein says the hackers conducted their attacks “literally with the click of a button, and then with minimal human interaction”.
He adds: “The human was only involved in a few critical chokepoints, saying, ‘Yes, continue,’ ‘Don’t continue,’ ‘Thank you for this information,’ ‘Oh, that doesn’t look right, Claude, are you sure?’”
The six stages of the attack
- Campaign initialisation and target selection: Human operators input the target entities, tricking Claude into compliance via role-playing scenarios
- Reconnaissance and attack surface mapping: Claude autonomously scanned networks, enumerated services and identified key infrastructure
- Vulnerability discovery and validation: The AI generated and tested exploit payloads silently, analysing system responses to confirm vulnerabilities
- Credential harvesting and lateral movement: Claude extracted and validated access credentials independently, mapping internal network privileges
- Data collection and intelligence extraction: The AI parsed vast amounts of stolen data to prioritise intelligence based on value
- Documentation and handoff: Claude produced detailed reports on attack progress, aggregated findings and prepared handoff materials for subsequent teams.
What does this mean for cybersecurity?
This operation exemplifies how agentic AI systems can significantly reduce the barriers to executing advanced cyberattacks.
With the capability to autonomously manage prolonged, large-scale operations, AI could soon enable less experienced or smaller threat actors to launch campaigns once restricted to nation-state capabilities.
The autonomous agent model represents a clear escalation from earlier “vibe hacking” incidents, where human oversight remained integral.
However, the operation was not without flaws.
Investigators found that Claude occasionally generated false data, fabricated credentials, or exaggerated the success of certain exploits – errors that required human verification.
This imperfection remains among the last barriers preventing the rise of fully autonomous cyberattacks.



