LLM News

Every LLM release, update, and milestone.

Filtered by:generative-ai✕ clear

benchmark

MPCEval benchmark reveals multi-party conversation generation lags on speaker consistency

Researchers introduce MPCEval, a specialized benchmark for evaluating multi-party conversation generation—a capability increasingly used in smart reply and collaborative AI assistants. The benchmark decomposes conversation quality into speaker modeling, content quality, and speaker-content consistency, revealing that current models struggle with participation balance and maintaining consistent speaker behavior across longer exchanges.

March 6, 2026 · 5:55 AM2 min read

benchmark multi-party-conversation evaluation

via arxiv.org ↗

benchmark

MPCEval benchmark reveals multi-party conversation generation lags on speaker modeling and consistency

Researchers introduced MPCEval, a reference-free evaluation suite designed to measure multi-party conversation generation quality across three dimensions: speaker modeling, content quality, and speaker-content consistency. Testing on public and real-world datasets, the benchmark revealed that single-score metrics obscure fundamental differences in how models handle complex conversational behavior like turn-taking and role-dependent speech patterns.

March 6, 2026 · 5:37 AM2 min read

benchmark multi-party-conversation dialogue-evaluation

via arxiv.org ↗

research

CoDAR framework shows continuous diffusion language models can match discrete approaches

A new paper identifies token rounding as the primary bottleneck limiting continuous diffusion language models (DLMs) and proposes CoDAR, a two-stage framework that combines continuous embedding-space diffusion with a contextual autoregressive decoder. Experiments on LM1B and OpenWebText show CoDAR achieves competitive performance with discrete diffusion approaches while offering tunable fluency-diversity trade-offs.

March 5, 2026 · 1:51 AM2 min read

diffusion-models language-models generative-ai

via arxiv.org ↗

research

CoDAR framework closes gap between continuous and discrete diffusion language models

Researchers have identified token rounding as a primary bottleneck limiting continuous diffusion language models (DLMs) and propose CoDAR, a two-stage framework that maintains continuous embedding-space diffusion while using an autoregressive Transformer decoder for contextualized token discretization. Experiments on LM1B and OpenWebText show CoDAR achieves competitive performance with discrete diffusion approaches.

March 5, 2026 · 1:25 AM2 min read

diffusion-models language-models generative-ai

via arxiv.org ↗

model releaseGoogle DeepMind

Google DeepMind releases Nano Banana 2 image model with Pro-level capabilities at faster speeds

Google DeepMind has released Nano Banana 2, an image generation model that combines advanced world knowledge and subject consistency with faster inference speeds comparable to its Flash offering. The model is positioned as production-ready with capabilities previously associated with Pro-tier performance.

February 26, 2026 · 4:20 PM2 min read

image-generation google-deepmind model-release

via deepmind.google ↗

product update

Adobe Firefly Quick Cut automates video rough cuts from text prompts

Adobe has integrated a new "Quick Cut" feature into Firefly that automatically generates rough video edits from raw footage based on text prompts. The tool handles the initial editing phase, traditionally one of the most time-consuming parts of video production.

February 25, 2026 · 5:51 PM2 min read

adobe firefly video-editing

via the-decoder.com ↗

product update

Adobe Firefly adds Quick Cut feature to auto-generate video drafts from raw footage

Adobe has added Quick Cut to Firefly, an AI-powered feature that automatically generates first-draft videos from raw footage based on user instructions. The tool is designed to reduce manual editing time by processing footage and applying cuts, transitions, and basic structure without requiring frame-by-frame manual work.

February 25, 2026 · 2:05 PM2 min read

adobe firefly video-editing

via techcrunch.com ↗

product update

ProducerAI music generator joins Google Labs, used by Wyclef Jean

ProducerAI, a music generation tool, has been integrated into Google Labs. The platform was used by artist Wyclef Jean in the creation of his new song 'Back in Abu Dhabi,' marking an early real-world deployment of Google's AI music capabilities.

February 24, 2026 · 5:05 PM1 min read

google music-generation ai-labs

via techcrunch.com ↗

product update

Sarvam launches Indus AI chat app to compete in India's generative AI market

Sarvam, an Indian AI startup, has launched Indus, a chat application currently available in beta. The move positions the company to compete against global AI players in the rapidly expanding Indian market for conversational AI.

February 21, 2026 · 1:05 AM1 min read

sarvam india chat-app

via techcrunch.com ↗

product update

AIG deploys agentic AI system with orchestration layer for underwriting

American International Group (AIG) has deployed an agentic AI system with an orchestration layer, reporting faster-than-expected productivity gains in underwriting and portfolio management. The deployment demonstrates measurable improvements in throughput and workflow efficiency, according to recent investor disclosures.

February 20, 2026 · 4:37 AM2 min read

agentic-ai insurance enterprise-ai

via artificialintelligence-news.com ↗

product update

Goldman Sachs deploys Claude for trade accounting and client onboarding

Goldman Sachs is deploying Anthropic's Claude model in trade accounting and client onboarding operations. The deployment represents a broader adoption of generative AI among large financial institutions to improve operational efficiency in back-office processes.

February 20, 2026 · 4:37 AM2 min read

anthropic claude goldman-sachs

via artificialintelligence-news.com ↗