What AWSâ Disruption Reveals About AI Infrastructure

An ongoing Amazon Web Services (AWS) disruption affecting millions highlights a critical vulnerability at the heart of modern digital and AI infrastructure.
As businesses increasingly migrate their core operations and machine learning (ML) pipelines to the cloud, the reliance on a few key providers exposes them to substantial risk.
An outage at a single data centre can have a cascading global impact, demonstrating that the foundations of our digital world could be more fragile than many assume.
In a statement, AWS confirms the issue: âWe can confirm significant error rates for requests made to the DynamoDB endpoint in the US-EAST-1 Region.
âThis issue also affects other AWS Services in the US-EAST-1 Region as well. During this time, customers may be unable to create or update Support Cases. Engineers were immediately engaged and are actively working on both mitigating the issue and fully understanding the root cause.â
Cloud dependency and AI infrastructure
The disruption originated in Amazonâs North Virginia data centres, impacting foundational services like DynamoDB and EC2.
These services are the database and computing power that thousands of companies rent to power their applications, from everyday consumer apps to complex AI model training and deployment. When these core components fail, the ripple effect is immediate.
A partial list of affected services includes:
- Snapchat
- Fortnite
- Duolingo
- Canva
- Wordle
- Lloyds
- Slack
- monday.com
- Bank of Scotland
- HMRC
- Zoom
- Barclays
- Vodafone
For AI and ML operations, this could mean a halt to data processing model inference and the functioning of automated systems that rely on real-time data analysis.
The wide range of services affected, including Snapchat, Fortnite and numerous banking applications, highlights the deep integration of AWS into the digital ecosystem.
A pattern of disruption
This is not an isolated incident for the cloud computing giant. AWS has experienced several major outages over the years.
In 2012, a 20-hour Christmas Eve outage partially disrupted Netflixâs streaming service. Another December outage in 2021 caused problems for last-minute holiday shoppers.
June saw a problem with AWS Lambda that led to increased error rates across multiple services affecting organisations like The Boston Globe and the Associated Press.
These events show that even the most advanced infrastructure remains vulnerable to failure. This pattern is not unique to AWS.
- Snapchat
- Fortnite
- Duolingo
- Canva
- Wordle
- Lloyds
- Slack
- monday.com
- Bank of Scotland
- HMRC
- Zoom
- Barclays
- Vodafone
Microsoft Azure has also faced major downtime. In January 2023, a network issue at Microsoft Azure took down Teams 365 and Outlook.
Such events can compromise more than just convenience in regulated sectors like finance and healthcare; downtime can affect audit trails and jeopardise compliance.
Interconnectivity and business resilience
The interconnected nature of modern business systems means an outage at a major cloud platform can have far-reaching consequences.
George Foley, Technical Advisor at ESET Ireland, a subsidiary of global software company ESET, explains the situation: âWhen one of the major cloud platforms goes down, it reminds everyone how interconnected modern business systems have become.
âEven if your own website or app isnât hosted on AWS, thereâs a good chance something you use from your CRM to your payment processor is.
âOutages like this highlight the importance of having resilience plans in place, including backups and alternative routes for essential data and services.â
For businesses leveraging AI, Georgeâs point is particularly salient. An AIâs data pipeline might pull from various sources, its models may be hosted on one platform and its outputs integrated with another.
A failure in any part of this chain can bring the entire system down. Internet disruptions can inflict billions of dollars in annual losses through their impact on revenue productivity and reputation.
According to a 2024 survey, 76% of global respondents run applications on AWS, and with the service powering more than 90% of Fortune 100 companies, the question is not whether outages will occur but how organisations can build resilience to mitigate their impact when they do.



