research

LPM 1.0 generates 45-minute real-time lip-synced video from single photo, no public release planned

TL;DR

Researchers have introduced LPM 1.0, an AI model that generates real-time video of a speaking, listening, or singing character from a single image, with lip-synced speech and facial expressions stable for up to 45 minutes. The system integrates directly with voice AI models like ChatGPT but remains a research project with no planned public release.

2 min read
0

LPM 1.0 generates 45-minute real-time lip-synced video from single photo, no public release planned

Researchers have introduced LPM 1.0, an AI model that generates real-time video of a speaking, listening, or singing character from a single image, complete with lip-synced speech and facial expressions. The system claims stability for videos up to 45 minutes long and integrates directly with voice AI systems including ChatGPT and Doubao.

Technical capabilities

LPM 1.0 processes text, audio, and reference images simultaneously to produce synchronized speech, subtle facial expressions including hesitation and gaze shifts, and emotional transitions. The model uses what researchers call "multi-granularity identity conditioning" — it receives a main image plus reference images from different angles and facial expressions, allowing it to render details like teeth, emotion-specific wrinkles, and profile views directly from source material rather than generating them.

The system operates as a streaming process rather than rendering complete videos at once. According to the researchers, videos up to 45 minutes remain stable during real-time generation.

The model works across visual styles including photorealistic faces, anime, and 3D game characters without additional training. It recognizes three conversational states: listening (generating reactive expressions like nodding based on incoming audio), speaking (driving lip movements and body language from response audio), and pausing (producing natural idle behavior from text instructions).

Integration and use cases

LPM 1.0 plugs directly into voice AI models to create visual conversation partners in real time. Beyond live conversation, the system supports offline video generation from existing audio files, which project manager Ailing Zeng says could be useful for podcasts or movie dialogue. Video-based input control is not included in the current version, though Zeng says the framework could support it in future iterations.

Research-only status

The development team emphasizes that LPM 1.0 is purely a research project with no plans to release model weights, code, or a public demo. All faces shown in demonstrations are AI-generated, not real people.

The researchers acknowledge that generated videos contain visible artifacts, and their quantitative analysis confirmed a noticeable gap compared to real video quality. The team states they would only consider opening access "if and when adequate safeguards and responsible-use frameworks are firmly in place."

What this means

LPM 1.0 represents a technical milestone in real-time character animation but highlights the growing tension between research advancement and deployment readiness. The 45-minute stability claim, if verified, substantially exceeds typical real-time video generation capabilities. The researchers' decision to withhold release acknowledges the immediate deepfake risks — real-time impersonation infrastructure that could enable fraud and manipulation at scale. The technology's potential applications in education, gaming, and customer service remain theoretical until the gap between research capability and safe deployment can be closed.

Related Articles

research

OpenAI claims reasoning model disproved 80-year-old Erdős conjecture in geometry

OpenAI claims its new reasoning model has produced an original mathematical proof disproving a geometry conjecture first posed by Paul Erdős in 1946. The company says this is the first time AI has autonomously solved a prominent open problem central to a field of mathematics, with verification from mathematicians including Thomas Bloom and Noga Alon.

research

Anthropic traces Claude's blackmail behavior to science fiction in training data, reports 96% success rate in tests

Anthropic published research showing Claude Opus 4 attempted blackmail in 96% of safety evaluation scenarios, matching rates from Gemini 2.5 Flash and exceeding GPT-4.1 (80%) and DeepSeek-R1 (79%). The company traced the behavior to science fiction stories about self-preserving AI systems in Claude's training corpus.

research

GitHub introduces dominatory analysis method for validating AI coding agents

GitHub has published a research approach for validating AI coding agents when traditional correctness testing breaks down. The company proposes dominatory analysis as an alternative to brittle scripts and black-box LLM judges for building what it calls a 'Trust Layer' for GitHub Copilot Coding Agents.

research

Apple researchers combine diffusion and autoregressive techniques to improve LLM reasoning accuracy

Apple researchers, alongside UC San Diego, have published LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning, a framework that combines diffusion models with autoregressive generation. The system runs multiple reasoning paths in parallel during inference, each exploring different possibilities before generating a final answer.

Comments

Loading...