model release

AI2 releases robotics models trained entirely in simulation, achieving zero-shot real-world transfer

TL;DR

AI2 has released MolmoSpaces and MolmoBot, robotics models trained exclusively in simulation that transfer directly to real robots without manual real-world data collection or fine-tuning. The approach eliminates months of teleoperated demonstrations typically required for simulation-trained robots. Both systems are open-source.

2 min read
0

AI2 Releases Robotics Models Trained Entirely in Simulation, Achieving Zero-Shot Real-World Transfer

AI research institute AI2 has released two open-source robotics models—MolmoSpaces and MolmoBot—trained exclusively in simulation that achieve zero-shot transfer to real robots without any manually collected real-world data or fine-tuning.

What's New

The models represent a significant shift in robotics training methodology. Conventional approaches require researchers to spend months collecting teleoperated real-world demonstrations before simulation-trained robots become reliable. AI2's approach eliminates this bottleneck entirely.

MolmoSpaces, the foundation dataset, contains:

  • 230,000+ indoor scenes
  • 130,000+ curated objects
  • 42 million physics-based robotic grasping annotations

MolmoBot, built on MolmoSpaces, demonstrates capabilities including:

  • Object picking and placement
  • Opening drawers
  • Operating doors

All tasks execute without training data from real-world demonstrations.

The Technical Approach

According to Ranjay Krishna, director of the PRIOR team at AI2, the key insight is straightforward: the simulation-to-reality gap shrinks dramatically when researchers increase the variety of simulated environments, objects, and camera conditions. Rather than improving physics simulation fidelity, the models benefit from diversity in training conditions.

This aligns with recent findings in robotics research showing that breadth of training distribution often matters more than pixel-perfect simulation accuracy. By exposing models to hundreds of thousands of variations in scene configuration, object types, and viewpoints, the models learn generalizable behaviors that transfer directly to physical systems.

Open-Source Release

Both models and supporting tools are available publicly. Technical details are available in the accompanying research paper. The open-source approach allows other research groups and robotics companies to build on the foundation rather than starting from zero with their own simulation data collection.

What This Means

This work addresses one of robotics' most significant friction points: the cost and time required to train deployable systems. If the zero-shot transfer results hold up in broader testing, the implications are substantial. Companies and research labs could dramatically reduce development timelines—from months of manual demonstration collection to weeks of model training. This could accelerate deployment of manipulation tasks in warehousing, manufacturing, and service robotics.

The emphasis on simulation diversity over physics accuracy also reframes how the robotics community should approach simulation tools. Rather than competing on fidelity, platforms that generate high-variance synthetic training data may prove more valuable. This could shift investment and resource allocation within the robotics software ecosystem.

Related Articles

model release

Stability AI releases Stable Audio Open Small for on-device audio generation with Arm

Stability AI has open-sourced Stable Audio Open Small in partnership with Arm, a smaller and faster variant of its text-to-audio model designed for on-device deployment. The model maintains output quality and prompt adherence while reducing computational requirements for real-world edge deployment on devices powered by Arm's technology, which runs on 99% of smartphones globally.

model release

Rakuten releases RakutenAI-3.0, 671B-parameter Japanese-optimized mixture-of-experts model

Rakuten Group has released RakutenAI-3.0, a 671 billion parameter mixture-of-experts (MoE) model designed specifically for Japanese language tasks. The model activates 37 billion parameters per token and supports a 128K context window. It is available under the Apache License 2.0 on Hugging Face.

model release

Nvidia releases Nemotron 3 Super: 120B MoE model with 1M token context

Nvidia has released Nemotron 3 Super, a 120-billion parameter hybrid Mamba-Transformer Mixture-of-Experts model that activates only 12 billion parameters during inference. The open-weight model features a 1-million token context window, multi-token prediction capabilities, and pricing at $0.10 per million input tokens and $0.50 per million output tokens.

model release

NVIDIA releases Nemotron 3 Content Safety 4B for multimodal, multilingual moderation

NVIDIA released Nemotron 3 Content Safety 4B, an open-source multimodal safety model designed to moderate content across text, images, and multiple languages. Built on Gemma-3 4B-IT with a 128K context window, the model achieved 84% average accuracy on multimodal safety benchmarks and supports over 140 languages through culturally-aware training data.

Comments

Loading...