How Red Hat and AWS Bring Scalable Gen AI to the Enterprise

Red Hat is deepening its long-standing partnership with Amazon Web Services (AWS) to make running Gen AI at scale on AWS simpler and more cost-efficient for enterprises.
The new collaboration, announced at AWS re:Invent 2025, brings together Red Hat AI, OpenShift and Ansible with AWSâ Trainium and Inferentia silicon to deliver a full-stack, production-ready pathway from AI pilots to governed, organization-wide deployment.
Colin Brace, Vice President of Annapurna Labs at AWS, says: “Enterprises demand solutions that deliver exceptional performance, cost efficiency and operational choice for mission-critical AI workloads.
“AWS designed its Trainium and Inferentia chips to make high-performance AI inference and training more accessible and cost-effective.
“Our collaboration with Red Hat provides customers with a supported path to deploying generative AI at scale, combining the flexibility of open source with AWS infrastructure and purpose-built AI accelerators to accelerate time-to-value from pilot to production.”
Silicon-first gen AI for the enterprise
The rapid rise of Gen AI is compelling CIOs to rethink how they provision compute for inference, where ongoing costs can quickly exceed initial training budgets.
Analyst firm IDC expects that âby 2027, 40% of organisations will use custom silicon, including ARM processors or AI/ML-specific chips, to meet rising demands for performance optimisation, cost efficiency and specialised computingâ.
That helps explain why enterprises are looking beyond general-purpose GPUs for AI at scale.
AWS Trainium and Inferentia are purpose-built for this need, with AWS stating that its latest Trainium2-based instances can deliver between 30% and 40% better price-performance than current GPU-based Amazon EC2 instances for Gen AI workloads.
Against this backdrop, Red Hat is positioning its open hybrid cloud stack to abstract model operations from specific accelerators while still taking advantage of the economics of AWS custom silicon.
The aim? To let IT decision-makers standardise on a common AI operations layer, all while retaining the freedom to mix and match models and hardware as requirements evolve.
Red Hat AI Inference Server meets AWS chips
Central to the announcement is Red Hat AI Inference Server, built on the high-performance vLLM inference framework and now being optimised to run on AWS Trainium and Inferentia.
By establishing a common inference layer tuned for AWSâ AI accelerators, the companies say customers can target any supported Gen AI model while gaining higher throughput, lower latency and improved price-performance versus comparable GPU instances.
Red Hat and AWS are also collaborating upstream on an AWS AI chip plugin for vLLM, reinforcing both firmsâ open source credentials and ensuring performance improvements flow back to the broader community.
This work is closely linked to llm-d, an open source project for distributed inference at scale that Red Hat has now brought into Red Hat OpenShift AI 3 as a commercially supported capability.
âBy enabling our enterprise-grade Red Hat AI Inference Server, built on the innovative vLLM framework, with AWS AI chips, weâre empowering organisations to deploy and scale AI workloads with enhanced efficiency and flexibility,â says Joe Fernandes, Vice President and General Manager of the AI Business Unit at Red Hat.
âBuilding on Red Hatâs open source heritage, this collaboration aims to make generative AI more accessible and cost-effective across hybrid cloud environments.â
OpenShift, Neuron and Ansible automation
For Kubernetes-first organisations, Red Hat and AWS have co-developed an AWS Neuron operator spanning Red Hat OpenShift, Red Hat OpenShift AI and Red Hat OpenShift Service on AWS (ROSA).
The operator gives platform teams a supported, Kubernetes-native path to target AWS accelerators, streamlining lifecycle management and making it easier to align AI deployments with existing cluster operations and policies.
The collaboration also extends into automation via the amazon.ai Certified Ansible Collection for Red Hat Ansible Automation Platform, enabling teams to declare and orchestrate AWS AI services, agents and monitoring as code.
As Gen AI stacks grow more complex, this kind of idempotent, auditable automation helps enterprises keep AI deployments consistent across environments while meeting governance and compliance requirements.
What are customers and analysts saying?
Real-world adopters are already using Red Hat OpenShift Service on AWS as a foundation for modernising mission-critical applications and embedding AI in production.
For CAE, a global provider of simulation and training solutions, the managed OpenShift service on AWS has become a key enabler for its digital transformation and AI integration strategy.
Jean-François Gamache, CIO and VP of Digital Services at CAE, says: âModernising our critical applications with Red Hat OpenShift Service on AWS marks a significant milestone in our digital transformation.
“This platform supports our developers in focusing on high-value initiatives – driving product innovation and accelerating AI integration across our solutions.
“Red Hat OpenShift provides the flexibility and scalability that enable us to deliver real impact, from actionable insights through live virtual coaching to significantly reducing cycle times for user-reported issues.”
Industry analysts see the economics of inference as a defining issue for enterprise AI over the next few years.
“As AI inference costs escalate, enterprises are prioritising efficiency alongside performance,” explains Anurag Agrawal, Founder and Chief Global Analyst at Techaisle.
âThis collaboration exemplifies Red Hatâs âany model, any hardwareâ strategy by combining its open hybrid cloud platform with the distinct economic advantages of AWS Trainium and Inferentia.
âIt empowers CIOs to operationalise generative AI at scale, shifting from cost-intensive experimentation to sustainable, governed production.â







