video-generation
33 articles tagged with video-generation
Google Vids Opens AI Avatar Feature to Free Users, Reaches 7M Monthly Active Users
Google Vids now offers AI avatars to free personal accounts, providing 10 video generations per month that can be split between avatars and Veo video generation. The Workspace video creation tool has reached 7 million monthly active users.
Google cuts AI Plus subscription to $5/month, doubles storage to 400GB
Google lowered its AI Plus subscription from $8 to $5 per month and doubled included storage from 200GB to 400GB. The plan includes access to Gemini 3 Pro, Nano Banana Pro, Deep Research, and the newly announced Gemini Omni video generation model.
ByteDance Open-Sources Bernini-R Video Diffusion Model With Semantic Planning Architecture
ByteDance released Bernini-R, an open-source video generation and editing model that combines an MLLM-based semantic planner with a DiT-based renderer. The model requires Hopper-class GPUs (H100/H800/H200) for optimal performance and supports multiple tasks including text-to-video, video editing, and reference-guided generation.
Nvidia Releases Cosmos 3 Video Generation Models in Three Sizes: Nano, Super, and Super-Image2Video
Nvidia has released three variants of its Cosmos 3 video generation model family on Hugging Face: Cosmos3-Nano, Cosmos3-Super, and Cosmos3-Super-Image2Video. The release includes models for both standard video generation and specialized image-to-video conversion, though detailed specifications including parameter counts and benchmark scores have not yet been disclosed.
NVIDIA Releases Cosmos 3: 64B-Parameter Omnimodal World Model for Physical AI
NVIDIA released Cosmos 3, an omnimodal world foundation model platform for Physical AI spanning robotics, autonomous driving, and industrial environments. The flagship Cosmos3-Super variant contains 64 billion parameters and generates video, images, audio, and action commands from text, image, video, and action trajectory inputs using a Mixture-of-Transformers architecture.
NVIDIA Releases Cosmos3-Super: 64B-Parameter Omnimodal World Model for Physical AI
NVIDIA released Cosmos3-Super, a 64-billion parameter omnimodal foundation model that generates video, images, audio, and action commands from combinations of text, image, video, and action trajectory inputs. The model, part of the Cosmos3 collection, targets Physical AI applications including robotics, autonomous vehicles, and industrial automation.
NVIDIA Releases Cosmos3-Nano: 16B-Parameter Omnimodal World Model for Physical AI with 256K Token Context
NVIDIA has released Cosmos3-Nano, a 16-billion parameter omnimodal world model capable of generating video, audio, images, and robot action commands from combinations of text, image, video, and action trajectory inputs. The model supports a 256K token context window and is designed for Physical AI applications including robotics, autonomous vehicles, and smart manufacturing environments.
NVIDIA Releases Cosmos 3: 8B and 32B Omni-Models Combining Video Generation, Reasoning, and Action in Single Architectur
NVIDIA has released Cosmos 3, a unified omni-model that combines world generation, physical reasoning, and action generation in a single architecture. Available in 8B (Nano) and 32B (Super) parameter versions on Hugging Face, Cosmos 3 uses a Mixture-of-Transformers architecture to process text, image, video, audio, and action modalities without switching between separate models.
Google launches Gemini Omni, multimodal AI video generator with avatar cloning and physics modeling
Google has released Gemini Omni, a multimodal AI video generation tool that accepts text, images, audio, and video as inputs. The first tier, Gemini Omni Flash, includes avatar cloning that creates digital versions of users and incorporates physics modeling for realistic motion.
Google cuts AI Ultra plan to $200/month, launches new $100 developer tier
Google announced pricing changes to its Gemini AI subscription tiers at I/O 2026, cutting its top AI Ultra plan from $250 to $200 per month while introducing a new $100/month developer-focused tier. All plans now get access to Gemini 3.5 Flash and the new Gemini Omni video generation model.
Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis
Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.
NVIDIA releases LoRA/DoRA fine-tuning guide for Cosmos Predict 2.5 to generate synthetic robot training data
NVIDIA published a technical guide for parameter-efficient fine-tuning of its Cosmos Predict 2.5 world model using LoRA and DoRA adapters. The method allows teams to adapt the 2B-parameter model to robot manipulation tasks on a single 80GB GPU, generating synthetic training trajectories from just 92 demonstration videos.
Google adds Veo 3.1 Lite to Ultra subscriptions at zero credit cost starting May 10
Google is adding Veo 3.1 Lite to Ultra subscriptions at zero credit cost starting May 10, 2026. The model costs less than half of Veo 3.1 Fast but generates videos at the same speed according to Google, though quality tradeoffs remain unclear.
Google launches AI avatar tool for YouTube Shorts creators
YouTube is rolling out an AI avatar feature that lets creators generate digital versions of themselves for use in Shorts videos. The tool requires users to record a "live selfie" with face and voice data, generates clips up to 8 seconds long, and marks all AI-generated content with watermarks and digital labels.
YouTube Shorts adds AI avatars that replicate your voice and appearance
YouTube is rolling out an AI avatar feature that lets users create photorealistic versions of themselves for YouTube Shorts. Users record a live selfie and voice prompts to generate an avatar that can create up to 8-second video clips. The feature includes watermarks, digital labels (SynthID and C2PA), and AI-generated content disclosures.
Tencent releases OmniWeaving, open-source video generation model with reasoning and multi-modal composition
Tencent's Hunyuan team released OmniWeaving on April 3, 2026, an open-source video generation model designed to compete with proprietary systems like Seedance-2.0. The model combines multimodal composition, reasoning-informed capabilities, and supports eight video generation tasks including text-to-video, image-to-video, video editing, and compositional generation.
Microsoft releases three multimodal AI models to compete with OpenAI and Google
Microsoft AI released three foundational models on April 2: MAI-Transcribe-1 for speech-to-text across 25 languages, MAI-Voice-1 for audio generation, and MAI-Image-2 for video generation. The company positions these models as cheaper alternatives to Google and OpenAI offerings. Models are available on Microsoft Foundry with pricing starting at $0.36 per hour for transcription.
Google launches Veo 3.1 Lite, cutting video generation costs by half
Google announced Veo 3.1 Lite, a cost-reduced video generation model priced at less than 50% of Veo 3.1 Fast's cost. The model supports text-to-video and image-to-video generation at 720p or 1080p resolution with customizable durations of 4s, 6s, or 8s, rolling out today on the Gemini API and Google AI Studio.
OpenAI shuts down Sora app April 2026, API follows September 2026
OpenAI is shutting down Sora in two phases: the web app and mobile application close April 26, 2026, followed by the API on September 24, 2026. Users must export their videos and images before the cutoff dates, as user data will be permanently deleted afterward. The discontinuation reflects OpenAI's strategic shift toward coding tools and enterprise products.
ByteDance rolls out Dreamina Seedance 2.0 video generation to CapCut with IP safeguards
ByteDance confirmed Thursday that Dreamina Seedance 2.0, its audio and video generation model, is rolling out in CapCut across seven initial markets. The model generates videos up to 15 seconds with realistic textures and motion, but includes safety restrictions blocking generation from real faces and unauthorized IP use.
OpenAI shutters Sora video tool after Disney deal collapse, signaling shift to enterprise focus
OpenAI announced the shutdown of its Sora video generation app on Tuesday via an X post, just two days after publishing usage guidelines and following Disney's withdrawal from a proposed $1 billion investment deal. The move represents OpenAI's second major product discontinuation in recent months, after deprecating GPT-4o in January with two weeks' notice.
OpenAI shuts down Sora app with no explanation; Disney deal collapses
OpenAI announced the shutdown of its Sora standalone video generation app on X, though the company provided no explanation for the decision. The closure kills a partnership deal with Disney that would have allowed Sora to generate videos using Disney IP. Video generation capabilities may remain available through other OpenAI channels.
OpenAI shutting down Sora video app 15 months after launch
OpenAI announced Tuesday it will shut down Sora, the video generation app that launched publicly in December 2024. The shutdown comes as OpenAI refocuses on business and productivity applications and faces intensifying competition from ByteDance's SeeDance 2.0 and Google's Veo.
OpenAI shuts down Sora video app amid declining user engagement and strategic shift
OpenAI announced the discontinuation of Sora, its consumer video generation app and API. The shutdown follows declining user engagement and aligns with OpenAI's strategic pivot toward enterprise AI products and robotics research, with Disney exiting a planned $1 billion investment.
OpenAI discontinues Sora video generator, ending $1B Disney deal
OpenAI announced Tuesday that it is discontinuing Sora, its video generation tool launched in late 2024, along with both the standalone app and developer API access. The shutdown also terminates Disney's $1 billion investment deal announced in December, which included licensing Disney characters for use within Sora and plans to distribute AI-generated videos on Disney+.
Stability AI releases Stable Virtual Camera for 3D multi-view video generation from 2D images
Stability AI has introduced Stable Virtual Camera, a multi-view diffusion model currently in research preview that generates 3D videos from 2D images with realistic depth and perspective transformations. The model requires no complex scene reconstruction or scene-specific optimization, enabling direct camera control across multiple viewpoints.
OpenAI plans to integrate Sora video generation directly into ChatGPT
OpenAI plans to add its Sora video generation model directly to ChatGPT, according to The Information. The move comes as the standalone Sora app, launched in September 2025, has seen declining engagement despite strong initial adoption. Integration could help grow ChatGPT's 900 million weekly active users toward 1 billion.
OpenAI Python SDK v2.27.0 adds Sora video API improvements and character support
OpenAI released version 2.27.0 of its Python SDK on March 13, 2026, adding significant improvements to the Sora video generation API. The update introduces character API support, video extension and editing capabilities, and higher resolution export options.
OpenAI plans to integrate Sora video generator directly into ChatGPT
OpenAI plans to integrate its Sora video generator as a built-in feature within ChatGPT, according to The Information. Currently available only on a standalone website and app, the integration would let users generate videos directly in the chatbot, similar to how image generation was added last year.
ByteDance's Helios reaches 19.5 FPS for minute-long video generation on single GPU
ByteDance has released Helios, a 14-billion-parameter open-weight video generation model that achieves 19.5 frames per second on a single GPU while generating minute-long video clips. The researchers claim this is the first model of its scale to reach near-real-time performance at this duration. Code and model weights are publicly available.
Google NotebookLM adds Cinematic Video Overviews and upgrades AI Mode Canvas
Google is expanding NotebookLM's video generation capabilities with Cinematic Video Overviews, which produce styled visual narratives beyond simple slide presentations. The update also includes upgrades to Canvas in AI Mode, enhancing the tool's ability to synthesize and present document insights.
Google NotebookLM now generates fully animated 'cinematic' videos from research notes
Google has upgraded NotebookLM's video overview feature to generate fully animated videos from research notes and documents, moving beyond the previous narrated slideshow format. The new capability uses multiple Google AI models including Gemini 3 and Veo 3 to automatically create visual content that matches the narrative.
Gemini app adds video templates for quicker content generation
Google has rolled out video templates in its Gemini app, enabling users to generate video content more quickly through pre-built starting points. This update follows the introduction of music generation capabilities the previous week.