Nvidia Releases Cosmos 3, a Suite of Open Physical AI Models
Cosmos 3 gives researchers and developers free access to state-of-the-art open models for physical AI, with the image generator beating all other open models in human-judged quality tests.
Reporting from 1 sources: GIGAZINE.
Nvidia announced Cosmos 3, a suite of physical AI foundation models for robotics and autonomous driving, on June 1, 2026. The suite includes five open models, with the image generation model Cosmos3-Super-Text2Image and the video generation model Cosmos3-Super-Image2Video both ranking as the top-performing open models in third-party tests.
Nvidia released Cosmos 3 on June 1, 2026 Japan time, a suite of five open physical AI foundation models aimed at robotics and autonomous driving. The models range from the 16-billion-parameter Cosmos3-Nano to the 65-billion-parameter Cosmos3-Super, with specialized variants for robot motion control, text-to-image generation, and image-to-video generation. Third-party evaluator Artificial Analysis ranked Cosmos3-Super-Text2Image as the top open image generation model as of May 28, 2026, based on human aesthetic judgment rather than automated benchmarks. Among closed models it placed fourth, ahead of Nano Banana Pro. The video model Cosmos3-Super-Image2Video also led the open category, ranking 22nd overall. Nvidia plans to release Cosmos3-Edge, a real-time processing variant, soon.
- Cosmos3-Nano: 16-billion-parameter multimodal model supporting text, image, video, audio, and action input/output
- Cosmos3-Super: 65-billion-parameter multimodal model supporting text, image, video, audio, and action input/output
- Cosmos3-Nano-Policy-DROID: 16-billion-parameter multimodal model capable of robot motion control
- Cosmos3-Super-Text2Image: 65-billion-parameter image generation model; top open model per Artificial Analysis
- Cosmos3-Super-Image2Video: 65-billion-parameter video generation model; top open model per Artificial Analysis
Synthesized by Yomimono from the 1 cited source below, including Japanese-language reporting where cited, then editorially reviewed before publishing.