Google DeepMind connects Genie world model to 280 billion Street View images, Waymo already using for self-driving train
Google DeepMind has integrated its Genie world model with Street View's 280 billion images spanning 110 countries, enabling users to explore AI-generated simulations of real locations. Waymo is already using Genie 3 to train self-driving cars on rare scenarios like tornadoes and unexpected obstacles.
Google DeepMind connects Genie world model to 280 billion Street View images, Waymo already using for self-driving training
Google DeepMind announced at Google I/O that its Genie world model can now access Street View's 280 billion images captured across 110 countries and seven continents. The integration allows users to navigate AI-generated simulations of real locations, from snow-covered New York City blocks to London streets.
Genie 3 timeline and access
Genie 3 first appeared as a research preview in August 2025. DeepMind opened access to Google AI Ultra subscribers in the United States in January 2026. The Street View integration is now rolling out to some Ultra users in the US, with global expansion planned in coming weeks.
Waymo deployment for autonomous vehicle training
Waymo is already using Genie 3 in production to train self-driving cars on rare scenarios that would be dangerous or impractical to stage in real life, including tornadoes and unexpected encounters with elephants on roads. The Street View integration adds geographic realism to these training simulations.
Current limitations
According to Diego Rivas, product manager at DeepMind, the generated environments look closer to video games than photographs. The model is not yet physics-aware—demonstrations showed characters running through cacti without consequences. Research scientist Jack Parker-Holder estimates interactive world generation trails video generation by six to 12 months in accuracy.
For comparison, Google's Veo model already understands basic physics, and its Nano Banana tool can render accurate text in infographics. Genie has not reached that level.
Spatial continuity advantage
Jonathan Herbert, director of Google Maps, highlighted that Genie maintains spatial continuity. Users can turn 360 degrees inside a generated environment, and the AI remembers what was behind them rather than regenerating the scene from scratch with each viewpoint shift.
Two use cases
Parker-Holder identified two distinct audiences: robotics developers training agents in simulated environments that mirror actual locations, and ordinary users exploring for entertainment. The simulation-to-reality pipeline is a critical bottleneck in physical AI, with companies including Nvidia and Cadence working on similar problems.
What this means
Street View's dataset represents a competitive moat that no other AI lab can easily replicate—20 years of imagery across 110 countries. By connecting this data to a generative world model, Google transforms a passive mapping tool into an interactive training ground for robotics and autonomous vehicles. The Waymo deployment demonstrates immediate practical value beyond consumer exploration.
The six to 12-month lag behind video generation quality suggests rapid improvement is possible, but the current lack of physics awareness limits immediate applications. The spatial continuity feature indicates DeepMind is solving fundamental challenges in maintaining coherent 3D representations rather than simply generating impressive individual frames.
Related Articles
llm-gemini Plugin Adds Support for Google's Gemini 3.5 Flash Model
Developer Simon Willison released version 0.32 of the llm-gemini plugin, which adds support for Google's Gemini 3.5 Flash model. The plugin enables command-line access to Google's Gemini model family through the LLM tool.
Google's Project Genie adds Street View integration to generate explorable 3D worlds from real locations
Google announced that Project Genie, its interactive world-building tool powered by Nana Banana Pro models, now integrates Google Street View imagery to generate explorable 3D environments based on real locations. The feature is limited to US locations and requires an AI Ultra subscription.
AWS releases four multimodal evaluators for image-to-text AI tasks in Strands Evals SDK
AWS has added four multimodal evaluators to its Strands Evals SDK that judge image-to-text AI outputs by directly analyzing source images. The evaluators—Overall Quality, Correctness, Faithfulness, and Instruction Following—use multimodal large language models to detect visual hallucinations, factual errors, and instruction violations that text-only judges miss.
AWS SageMaker AI adds bidirectional streaming for real-time speech transcription with vLLM
Amazon SageMaker AI has launched bidirectional streaming support for real-time inference, enabling WebSocket-based voice applications through vLLM integration. The feature uses HTTP/2 on port 8443 to bridge client connections with vLLM's Realtime API, allowing audio to stream in while transcription streams back simultaneously over a single persistent connection.
Comments
Loading...