model release

Google launches Gemini Omni, multimodal AI video generator with avatar cloning and physics modeling

TL;DR

Google has released Gemini Omni, a multimodal AI video generation tool that accepts text, images, audio, and video as inputs. The first tier, Gemini Omni Flash, includes avatar cloning that creates digital versions of users and incorporates physics modeling for realistic motion.

2 min read
0

Google launches Gemini Omni, multimodal AI video generator with avatar cloning and physics modeling

Google has released Gemini Omni, a multimodal AI video generation tool that the company positions as doing "for video what Nano Banana did for images." The first tier, Gemini Omni Flash, is now rolling out to the Gemini app, Google Flow, and YouTube Shorts.

Core capabilities

According to Google, Omni accepts four input types: text, images, audio (currently voice recordings only), and video. The company claims the model can "create anything from any input," with plans to expand beyond video generation. The tool incorporates SynthID digital fingerprinting technology to identify AI-generated content.

Avatar cloning feature

Omni includes an Avatars feature that creates digital replicas of users, generating videos that "look and sound like you," according to Google. The company stated it is "still working to test" the capability to edit videos to change audio and speech, citing responsible deployment concerns.

Physics modeling

The model incorporates what Google describes as "an improved intuitive understanding of forces like gravity, kinetic energy, and fluid dynamics." This physics modeling aims to create more realistic motion compared to earlier AI video tools that treated objects like ragdolls rather than physical entities.

Natural language editing

Omni supports conversational video editing through natural language instructions. Google claims that "every instruction builds on the last" while maintaining character consistency and scene continuity. The company said users can "change specific things, or change everything" in existing videos, including adding characters, transforming objects, or altering backgrounds.

Google has not disclosed video resolution limits, supported aspect ratios, maximum clip length, or pricing per plan tier. The company also has not specified whether Omni will integrate with professional editing software like Final Cut, Premiere Pro, or DaVinci Resolve.

Availability

Gemini Omni Flash is rolling out now to enterprise customers through the Gemini app, Google Flow, and YouTube Shorts. Google has not announced whether the web version of Gemini will support Omni or if users must access it through the Flow interface.

What this means

Gemini Omni represents Google's entry into the competitive AI video generation space, directly challenging OpenAI's Sora. The physics modeling and multimodal input capabilities address key weaknesses in earlier AI video tools. However, the lack of disclosed specifications—particularly around resolution, format support, and professional workflow integration—leaves open questions about whether this targets casual creators or professional production environments. The avatar cloning feature introduces significant trust and verification challenges for video content, even with SynthID watermarking.

Related Articles

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

model release

Google launches Gemini Omni Flash, multimodal video generation model available to AI Plus subscribers

Google has released Gemini Omni Flash, the first model in its new Gemini Omni family designed to generate video content from text, images, video, and audio inputs. The model is available now to AI Plus subscribers, with free access coming to YouTube Shorts and YouTube Create later this week.

model release

Google launches Gemini 3.5 Flash and new Omni multimodal AI family at I/O 2026

Google launched Gemini 3.5 Flash today as the default model for its Gemini app and AI Mode in Search, with Gemini 3.5 Pro following next month. The company also introduced Gemini Omni, a new multimodal AI family capable of generating video from text, photos, video, and audio inputs.

Comments

Loading...