Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.
Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash on Tuesday at I/O 2026, a multimodal video generation model that accepts any combination of image, audio, video, and text inputs. The model began rolling out the same day to Gemini app users with AI Plus, Pro, and Ultra subscriptions, YouTube Shorts, and the YouTube Create app.
Model capabilities and limits
Gemini Omni Flash generates video clips capped at 10 seconds at launch. According to Koray Kavukcuoglu, CTO of Google DeepMind, the model supports conversational editing where each instruction builds on previous ones while preserving character identity and scene continuity across multiple turns.
Google claims the model has improved physics simulation including gravity, kinetic energy, and fluid dynamics. The company demonstrated prompts ranging from claymation protein-folding animations to chain-reaction physics tracks, though benchmark scores comparing Omni to Veo 3 or competing models like ByteDance's Seedance have not been disclosed.
The 10-second limit is shorter than OpenAI's Sora, which generates clips up to 60 seconds. Google has not disclosed per-clip costs, compute footprint per generation, or pricing for tiers beyond Flash.
Speech editing withheld, SynthID mandatory
Google is explicitly withholding general-purpose audio and speech editing capabilities. "We are still working to test this and better understand how we can bring this capability to users responsibly," Kavukcuoglu wrote, in what appears to be a deliberate step back from deepfake territory.
The model includes digital avatar generation requiring users to record their voice and likeness by speaking a series of numbers aloud during onboarding.
All videos generated with Omni carry Google's SynthID imperceptible digital watermark by default. Users can verify whether a clip was generated by Omni through the Gemini app, Gemini in Chrome, and Google Search. The SynthID implementation follows the C2PA open standard that OpenAI adopted earlier this year.
Availability and rollout
API access for developers and enterprise customers will launch in the coming weeks. Google has not disclosed the underlying model architecture relative to Veo 3, benchmark evaluation methodology, or timeline for enabling speech editing across the Omni family.
The model launched alongside Gemini 3.5 and other announcements at I/O 2026 that Sundar Pichai framed as the "agentic Gemini era" in his keynote.
What this means
Google's decision to withhold speech editing while releasing video generation marks a conservative deployment strategy compared to frontier competitors. The 10-second clip limit and lack of disclosed pricing create uncertainty about whether Omni represents a technical advance or primarily a product integration of existing capabilities. The mandatory SynthID watermarking positions Google as prioritizing provenance over feature completeness, though the effectiveness of imperceptible watermarks against adversarial removal remains an open question. The API rollout in coming weeks will reveal whether the cost structure and extended clip lengths under paid tiers can compete with Sora's current specifications.
Related Articles
Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June
Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.
Google launches Gemini Omni Flash, multimodal video generation model available to AI Plus subscribers
Google has released Gemini Omni Flash, the first model in its new Gemini Omni family designed to generate video content from text, images, video, and audio inputs. The model is available now to AI Plus subscribers, with free access coming to YouTube Shorts and YouTube Create later this week.
Google launches Gemini 3.5 Flash and new Omni multimodal AI family at I/O 2026
Google launched Gemini 3.5 Flash today as the default model for its Gemini app and AI Mode in Search, with Gemini 3.5 Pro following next month. The company also introduced Gemini Omni, a new multimodal AI family capable of generating video from text, photos, video, and audio inputs.
Google releases Gemini 3.5 Flash with autonomous coding and agent capabilities, claims 4x speed boost
Google released Gemini 3.5 Flash, positioning it as an agent-first model designed for autonomous coding and multi-hour workflows. The company claims the model outperforms its 3.1 Pro predecessor on coding and agentic benchmarks while running 4x faster than competing frontier models, with an optimized version achieving 12x speed gains.
Comments
Loading...