robotics
15 articles tagged with robotics
NVIDIA Releases Cosmos3-Super-Text2Image: 64B Parameter Model for Physical AI Applications
NVIDIA released Cosmos3-Super-Text2Image, a 64-billion parameter text-to-image generation model as part of its Cosmos3 collection of omnimodal world models. The model uses a Mixture-of-Transformers architecture combining autoregressive and diffusion transformers, designed for Physical AI applications including robotics and autonomous vehicles.
NVIDIA Releases Cosmos 3: 64B-Parameter Omnimodal World Model for Physical AI
NVIDIA released Cosmos 3, an omnimodal world foundation model platform for Physical AI spanning robotics, autonomous driving, and industrial environments. The flagship Cosmos3-Super variant contains 64 billion parameters and generates video, images, audio, and action commands from text, image, video, and action trajectory inputs using a Mixture-of-Transformers architecture.
NVIDIA Releases Cosmos3-Super: 64B-Parameter Omnimodal World Model for Physical AI
NVIDIA released Cosmos3-Super, a 64-billion parameter omnimodal foundation model that generates video, images, audio, and action commands from combinations of text, image, video, and action trajectory inputs. The model, part of the Cosmos3 collection, targets Physical AI applications including robotics, autonomous vehicles, and industrial automation.
NVIDIA Releases Cosmos3-Nano: 16B-Parameter Omnimodal World Model for Physical AI with 256K Token Context
NVIDIA has released Cosmos3-Nano, a 16-billion parameter omnimodal world model capable of generating video, audio, images, and robot action commands from combinations of text, image, video, and action trajectory inputs. The model supports a 256K token context window and is designed for Physical AI applications including robotics, autonomous vehicles, and smart manufacturing environments.
NVIDIA Releases Cosmos 3: 8B and 32B Omni-Models Combining Video Generation, Reasoning, and Action in Single Architectur
NVIDIA has released Cosmos 3, a unified omni-model that combines world generation, physical reasoning, and action generation in a single architecture. Available in 8B (Nano) and 32B (Super) parameter versions on Hugging Face, Cosmos 3 uses a Mixture-of-Transformers architecture to process text, image, video, audio, and action modalities without switching between separate models.
NVIDIA releases LocateAnything-3B vision-language model with 2.5× faster object detection via parallel box decoding
NVIDIA released LocateAnything-3B, a 3-billion parameter vision-language model that predicts bounding boxes in parallel rather than token-by-token, achieving up to 2.5× higher throughput compared to autoregressive approaches. The model, trained on 12M images with 138M+ queries and 785M bounding boxes, supports object detection, GUI element grounding, and robotics perception.
Google DeepMind connects Genie world model to 280 billion Street View images, Waymo already using for self-driving train
Google DeepMind has integrated its Genie world model with Street View's 280 billion images spanning 110 countries, enabling users to explore AI-generated simulations of real locations. Waymo is already using Genie 3 to train self-driving cars on rare scenarios like tornadoes and unexpected obstacles.
Google DeepMind Integrates Street View With Genie 3 World Model for Real-World Environment Simulation
Google DeepMind launched Street View integration with its Genie 3 world model at I/O 2026, allowing users to simulate real-world locations from 280 billion images across 110 countries. The feature enables environment modification including weather changes and supports robotics training, with initial access for U.S. Ultra subscribers expanding globally.
NVIDIA releases LoRA/DoRA fine-tuning guide for Cosmos Predict 2.5 to generate synthetic robot training data
NVIDIA published a technical guide for parameter-efficient fine-tuning of its Cosmos Predict 2.5 world model using LoRA and DoRA adapters. The method allows teams to adapt the 2B-parameter model to robot manipulation tasks on a single 80GB GPU, generating synthetic training trajectories from just 92 demonstration videos.
NVIDIA Releases GR00T N1.7, 3B-Parameter Open-Source Humanoid Robot Model Trained on 20,854 Hours of Human Video
NVIDIA released GR00T N1.7, a 3-billion parameter open-source Vision-Language-Action model for humanoid robots with commercial licensing. The model was trained on 20,854 hours of human egocentric video data and demonstrates the first documented scaling law for robot dexterity, where increasing human video data from 1,000 to 20,000 hours more than doubles task completion rates.
Physical Intelligence's π0.7 robot model performs tasks outside its training data
Physical Intelligence published research showing its π0.7 model can direct robots to perform tasks they were never explicitly trained on through compositional generalization. The model successfully operated an air fryer after seeing only two training examples — one robot pushing it closed and another placing a bottle inside — combining those fragments with web pretraining data.
AI2 releases robotics models trained entirely in simulation, achieving zero-shot real-world transfer
AI2 has released MolmoSpaces and MolmoBot, robotics models trained exclusively in simulation that transfer directly to real robots without manual real-world data collection or fine-tuning. The approach eliminates months of teleoperated demonstrations typically required for simulation-trained robots. Both systems are open-source.
AI2 uses virtual simulation data to train physical AI robots, reducing real-world data costs
AI2 is developing physical AI systems trained primarily on virtual simulation data rather than expensive real-world demonstrations. The approach, demonstrated through projects like MolmoBot, addresses the historical bottleneck of manually collecting hardware training data.
ABB and NVIDIA partnership shows physical AI simulation driving factory automation ROI
ABB and NVIDIA have partnered to deploy physical AI simulation in factory automation, addressing the critical sim-to-real gap that has limited intelligent robotics deployment. The approach uses digital physics simulation to train models that transfer reliably to actual factory floors, reducing production hurdles and securing measurable ROI.
Fei-Fei Li's World Labs raises $1B to develop spatial intelligence AI systems
World Labs, the AI startup founded by Fei-Fei Li, has raised $1 billion in new funding to develop spatial intelligence—AI systems capable of understanding and operating in three-dimensional physical environments. The capital will fund the development of world models, a class of AI architecture designed to reason about spatial relationships and physical interactions.