Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.
Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash on Tuesday at I/O 2026, a multimodal video generation model that accepts any combination of image, audio, video, and text inputs. The model began rolling out the same day to Gemini app users with AI Plus, Pro, and Ultra subscriptions, YouTube Shorts, and the YouTube Create app.
Model capabilities and limits
Gemini Omni Flash generates video clips capped at 10 seconds at launch. According to Koray Kavukcuoglu, CTO of Google DeepMind, the model supports conversational editing where each instruction builds on previous ones while preserving character identity and scene continuity across multiple turns.
Google claims the model has improved physics simulation including gravity, kinetic energy, and fluid dynamics. The company demonstrated prompts ranging from claymation protein-folding animations to chain-reaction physics tracks, though benchmark scores comparing Omni to Veo 3 or competing models like ByteDance's Seedance have not been disclosed.
The 10-second limit is shorter than OpenAI's Sora, which generates clips up to 60 seconds. Google has not disclosed per-clip costs, compute footprint per generation, or pricing for tiers beyond Flash.
Speech editing withheld, SynthID mandatory
Google is explicitly withholding general-purpose audio and speech editing capabilities. "We are still working to test this and better understand how we can bring this capability to users responsibly," Kavukcuoglu wrote, in what appears to be a deliberate step back from deepfake territory.
The model includes digital avatar generation requiring users to record their voice and likeness by speaking a series of numbers aloud during onboarding.
All videos generated with Omni carry Google's SynthID imperceptible digital watermark by default. Users can verify whether a clip was generated by Omni through the Gemini app, Gemini in Chrome, and Google Search. The SynthID implementation follows the C2PA open standard that OpenAI adopted earlier this year.
Availability and rollout
API access for developers and enterprise customers will launch in the coming weeks. Google has not disclosed the underlying model architecture relative to Veo 3, benchmark evaluation methodology, or timeline for enabling speech editing across the Omni family.
The model launched alongside Gemini 3.5 and other announcements at I/O 2026 that Sundar Pichai framed as the "agentic Gemini era" in his keynote.
What this means
Google's decision to withhold speech editing while releasing video generation marks a conservative deployment strategy compared to frontier competitors. The 10-second clip limit and lack of disclosed pricing create uncertainty about whether Omni represents a technical advance or primarily a product integration of existing capabilities. The mandatory SynthID watermarking positions Google as prioritizing provenance over feature completeness, though the effectiveness of imperceptible watermarks against adversarial removal remains an open question. The API rollout in coming weeks will reveal whether the cost structure and extended clip lengths under paid tiers can compete with Sora's current specifications.
Related Articles
Google launches Gemini 3.1 Flash Lite Image with 4-second generation time, $0.25 per 1M input tokens
Google has released Gemini 3.1 Flash Lite Image, a text-to-image model that generates 1K resolution images in approximately 4 seconds — 2.7× faster than Gemini 3.1 Flash Image. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with a 66K context window and knowledge cutoff of January 2025.
Google launches Nano Banana 2 Lite image model at 4 seconds per image, $0.04 per 1,000 generations
Google released Nano Banana 2 Lite, an image generation model that produces images in four seconds at under four cents per thousand images. The model prioritizes speed and cost over quality, targeting developers building high-volume image pipelines.
Google DeepMind releases Nano Banana 2 Lite at $0.034 per 1K image with 4-second generation, opens Gemini Omni Flash API
Google DeepMind released Nano Banana 2 Lite (gemini-3.1-flash-lite-image), its fastest image generation model with 4-second text-to-image latency priced at $0.034 per 1K-resolution image. The company also opened developer access to Gemini Omni Flash (gemini-omni-flash-preview) for video generation and editing at $0.10 per second of output.
Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance
Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.
Comments
Loading...