model release

Google launches Gemini Omni, multimodal AI video generator with avatar cloning and physics modeling

TL;DR

Google has released Gemini Omni, a multimodal AI video generation tool that accepts text, images, audio, and video as inputs. The first tier, Gemini Omni Flash, includes avatar cloning that creates digital versions of users and incorporates physics modeling for realistic motion.

May 27, 2026 · 1:51 AM2 min read

Gemini Omni Flash — Quick Specs

Compare Gemini Omni Flash with other models →

Google launches Gemini Omni, multimodal AI video generator with avatar cloning and physics modeling

Google has released Gemini Omni, a multimodal AI video generation tool that the company positions as doing "for video what Nano Banana did for images." The first tier, Gemini Omni Flash, is now rolling out to the Gemini app, Google Flow, and YouTube Shorts.

Core capabilities

According to Google, Omni accepts four input types: text, images, audio (currently voice recordings only), and video. The company claims the model can "create anything from any input," with plans to expand beyond video generation. The tool incorporates SynthID digital fingerprinting technology to identify AI-generated content.

Avatar cloning feature

Omni includes an Avatars feature that creates digital replicas of users, generating videos that "look and sound like you," according to Google. The company stated it is "still working to test" the capability to edit videos to change audio and speech, citing responsible deployment concerns.

Physics modeling

The model incorporates what Google describes as "an improved intuitive understanding of forces like gravity, kinetic energy, and fluid dynamics." This physics modeling aims to create more realistic motion compared to earlier AI video tools that treated objects like ragdolls rather than physical entities.

Natural language editing

Omni supports conversational video editing through natural language instructions. Google claims that "every instruction builds on the last" while maintaining character consistency and scene continuity. The company said users can "change specific things, or change everything" in existing videos, including adding characters, transforming objects, or altering backgrounds.

Google has not disclosed video resolution limits, supported aspect ratios, maximum clip length, or pricing per plan tier. The company also has not specified whether Omni will integrate with professional editing software like Final Cut, Premiere Pro, or DaVinci Resolve.

Availability

Gemini Omni Flash is rolling out now to enterprise customers through the Gemini app, Google Flow, and YouTube Shorts. Google has not announced whether the web version of Gemini will support Omni or if users must access it through the Flow interface.

What this means

Gemini Omni represents Google's entry into the competitive AI video generation space, directly challenging OpenAI's Sora. The physics modeling and multimodal input capabilities address key weaknesses in earlier AI video tools. However, the lack of disclosed specifications—particularly around resolution, format support, and professional workflow integration—leaves open questions about whether this targets casual creators or professional production environments. The avatar cloning feature introduces significant trust and verification challenges for video content, even with SynthID watermarking.

Source: zdnet.com ↗

google-deepmind gemini video-generation multimodal ai-avatars synthid

model releaseJuly 9, 2026

NVIDIA Releases Audex-30B-A3B: Unified Audio-Text Model With 1M Token Context and Speech Generation

NVIDIA released Audex-30B-A3B, a unified audio-text model built on the Nemotron-Cascade-2-30B-A3B backbone. The model handles audio understanding, speech recognition and translation, text-to-speech, audio generation, and speech-to-speech while supporting up to 1M token context length.

model releaseJuly 8, 2026

OpenAI Launches GPT-Live Voice Model That Delegates Complex Tasks to GPT-5.5

OpenAI has replaced ChatGPT's voice mode with GPT-Live, a new voice model that can delegate complex tasks to GPT-5.5 in the background. The previous voice mode was based on a GPT-4o era model with a 2024 knowledge cutoff.

model releaseJuly 10, 2026

Meta stock surges 15% as company releases Muse Spark 1.1 agentic model and Muse Image generator

Meta's stock surged 15% this week following the release of two AI models: Muse Spark 1.1 for agentic and coding workloads on Thursday, and Muse Image for image generation on Tuesday. The releases come three months after Meta introduced its first foundation model, Muse Spark, as the company competes with OpenAI, Anthropic, and Google.

model releaseJuly 10, 2026

OpenAI releases GPT-5.6 in three versions as COO Fidji Simo departs after 11 months

OpenAI released GPT-5.6 Thursday in three versions—Luna, Terra, and Sol—with Sol claiming benchmark wins over Anthropic's Claude Fable on coding tasks. The launch coincides with COO Fidji Simo's departure less than a year after joining, citing worsening health issues.

Google launches Gemini Omni, multimodal AI video generator with avatar cloning and physics modeling

Gemini Omni Flash — Quick Specs

Google launches Gemini Omni, multimodal AI video generator with avatar cloning and physics modeling

Core capabilities

Avatar cloning feature

Physics modeling

Natural language editing

Availability

What this means

Related Articles

NVIDIA Releases Audex-30B-A3B: Unified Audio-Text Model With 1M Token Context and Speech Generation

OpenAI Launches GPT-Live Voice Model That Delegates Complex Tasks to GPT-5.5

Meta stock surges 15% as company releases Muse Spark 1.1 agentic model and Muse Image generator

OpenAI releases GPT-5.6 in three versions as COO Fidji Simo departs after 11 months

Comments