NVIDIA's Cosmos 3: The World's First Fully Open AI Omnimodel

NVIDIA has released Cosmos 3 – an open world foundation model designed for physical AI applications.
The model uses a mixture-of-transformers architecture that integrates vision reasoning, world generation and action prediction within one system.
According to NVIDIA, Cosmos 3 is the world's first fully open omnimodel and the system can process and create text, images, video, ambient sound and actions with what the company describes as leading physics accuracy, which could reduce physical AI training times.
Physical AI readiness
According to McKinsey, robotics is ready to cross the gap from simulation to reality. The company adds that robots now operate in dynamic settings where adaptability and autonomy are essential.
NVIDIA says that Cosmos 3 allows robots, autonomous vehicles or vision agents to function in the real world with limited training data and fragmented simulation stacks. The model's architecture pairs a reasoning transformer with an expert generation transformer.
This combination enables Cosmos 3 to process object interactions, motion and spatial-temporal relationships before producing video and action trajectories.
The Cosmos platform now contains new datasets for robotics, physics, human motion, autonomous driving, warehouse safety and spatial reasoning.
The platform also includes new physical AI agent skills for neural scene reconstruction, defect-image generation and video augmentation.
According to Deloitte, greater integration of AI capabilities in robotic systems and the emergence of specialised foundational models means robots can expand across multiple industries and applications, including smart factories.
The firm predicts that cumulative installed capacity of industrial robots could reach 5.5m by 2026, globally.
Multimodal reasoning capabilities
Jensen Huang, Founder and Chief Executive of NVIDIA, says: "The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models.
"The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, AVs and vision AI that perceive, reason, plan and act in the physical world."
NVIDIA says Cosmos 3 Super, part of the lineup, is designed for post-training robotics and autonomous vehicle models that require the highest physics accuracy and generation quality.
The system can generate synthetic data and scene variations, then support post-training with embodiment-specific behaviour and environment data for tasks ranging from pick-and-place to dexterous manipulation.
Developers can deploy Cosmos 3 as a vision language model or the backbone for world action models.
The system also functions as a world model or video foundation model that simulates physical environments and predicts future world states for training and evaluation.
Industry adoption patterns
Physical AI developers are building on the Cosmos platform across industries.
Agile Robots, Doosan Robotics, LG Electronics, Samsung Electronics and Skild AI are using the platform for robotics.
Li Auto is deploying the platform for autonomous vehicles.
Centific, Fogsphere, Linker Vision, Milestone Systems and Yuan are using the platform for vision AI agents to power industrial AI and smart space applications.
NVIDIA announced Cosmos 3 alongside NVIDIA Cosmos Coalition, which the company describes as a global collaboration between world model builders and AI developers. Members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI.
According to NVIDIA, the coalition will advance open world models across industries. Members can contribute models, research and evaluation techniques while using Cosmos 3 technologies.

