LLM News

Every LLM release, update, and milestone.

Filtered by:generative-ai✕ clear
benchmark

MPCEval benchmark reveals multi-party conversation generation lags on speaker consistency

Researchers introduce MPCEval, a specialized benchmark for evaluating multi-party conversation generation—a capability increasingly used in smart reply and collaborative AI assistants. The benchmark decomposes conversation quality into speaker modeling, content quality, and speaker-content consistency, revealing that current models struggle with participation balance and maintaining consistent speaker behavior across longer exchanges.

benchmark

MPCEval benchmark reveals multi-party conversation generation lags on speaker modeling and consistency

Researchers introduced MPCEval, a reference-free evaluation suite designed to measure multi-party conversation generation quality across three dimensions: speaker modeling, content quality, and speaker-content consistency. Testing on public and real-world datasets, the benchmark revealed that single-score metrics obscure fundamental differences in how models handle complex conversational behavior like turn-taking and role-dependent speech patterns.

research

CoDAR framework shows continuous diffusion language models can match discrete approaches

A new paper identifies token rounding as the primary bottleneck limiting continuous diffusion language models (DLMs) and proposes CoDAR, a two-stage framework that combines continuous embedding-space diffusion with a contextual autoregressive decoder. Experiments on LM1B and OpenWebText show CoDAR achieves competitive performance with discrete diffusion approaches while offering tunable fluency-diversity trade-offs.

research

CoDAR framework closes gap between continuous and discrete diffusion language models

Researchers have identified token rounding as a primary bottleneck limiting continuous diffusion language models (DLMs) and propose CoDAR, a two-stage framework that maintains continuous embedding-space diffusion while using an autoregressive Transformer decoder for contextualized token discretization. Experiments on LM1B and OpenWebText show CoDAR achieves competitive performance with discrete diffusion approaches.

model releaseGoogle DeepMind

Google DeepMind releases Nano Banana 2 image model with Pro-level capabilities at faster speeds

Google DeepMind has released Nano Banana 2, an image generation model that combines advanced world knowledge and subject consistency with faster inference speeds comparable to its Flash offering. The model is positioned as production-ready with capabilities previously associated with Pro-tier performance.

product update

Adobe Firefly adds Quick Cut feature to auto-generate video drafts from raw footage

Adobe has added Quick Cut to Firefly, an AI-powered feature that automatically generates first-draft videos from raw footage based on user instructions. The tool is designed to reduce manual editing time by processing footage and applying cuts, transitions, and basic structure without requiring frame-by-frame manual work.

2 min readvia techcrunch.com
product update

AIG deploys agentic AI system with orchestration layer for underwriting

American International Group (AIG) has deployed an agentic AI system with an orchestration layer, reporting faster-than-expected productivity gains in underwriting and portfolio management. The deployment demonstrates measurable improvements in throughput and workflow efficiency, according to recent investor disclosures.