LLM News

Every LLM release, update, and milestone.

Filtered by:multimodal-ai✕ clear

research

Researchers identify 'Lazy Attention' problem in multimodal AI training, boost reasoning by 7%

A new paper from arXiv identifies a critical flaw in how multimodal large reasoning models initialize training: they fail to properly attend to visual tokens, a phenomenon researchers call Lazy Attention Localization. The team proposes AVAR, a framework that corrects this through visual-anchored data synthesis and attention-guided objectives, achieving 7% average improvements across seven multimodal reasoning benchmarks when applied to Qwen2.5-VL-7B.

March 5, 2026 · 5:37 AM2 min read

multimodal-ai reasoning-models training

via arxiv.org ↗

research

Crab+: New audio-visual model solves negative transfer problem in multimodal learning

A new audio-visual large language model called Crab+ addresses a critical problem in multimodal learning: negative transfer, where training on multiple tasks simultaneously causes performance degradation on nearly 55% of tasks. The model uses a new dataset of 222K samples and a technique called Interaction-aware LoRA to coordinate different audio-visual tasks, reversing the degradation trend to achieve positive transfer on 88% of tasks.

March 5, 2026 · 5:24 AM2 min read

audio-visual-learning multimodal-ai negative-transfer

via arxiv.org ↗

research

Research proposes MoD-DPO to reduce cross-modal hallucinations in multimodal LLMs

Researchers have introduced Modality-Decoupled Direct Preference Optimization (MoD-DPO), a framework designed to reduce cross-modal hallucinations in omni-modal large language models. The method adds modality-aware regularization to enforce sensitivity to relevant modalities while reducing reliance on spurious correlations, showing consistent improvements across audiovisual benchmarks.

March 5, 2026 · 1:36 AM2 min read

multimodal-ai llm-research hallucination-mitigation

via arxiv.org ↗

research

Perception-R1 uses visual reward signals to improve multimodal AI reasoning

Researchers propose Perception-R1, a method that adds visual perception reward signals to reinforcement learning training for multimodal AI models. The approach achieves state-of-the-art results on multiple reasoning benchmarks using just 1,442 training examples by explicitly teaching models to accurately perceive visual content before reasoning about it.

March 5, 2026 · 12:53 AM2 min read

multimodal-ai reinforcement-learning mllm

via arxiv.org ↗

researchApple

Apple Research Identifies 'Text-Speech Understanding Gap' Limiting LLM Speech Performance

Apple researchers have identified a fundamental limitation in speech-adapted large language models: they consistently underperform their text-based counterparts on language understanding tasks. The team terms this the 'text-speech understanding gap' and documents that speech-adapted LLMs lag behind both their original text versions and cascaded speech-to-text pipelines.

February 24, 2026 · 11:35 PM2 min read

apple-research multimodal-ai speech-recognition

via machinelearning.apple.com ↗

benchmark

New benchmark reveals AI models struggle with personal photo retrieval tasks

A new benchmark evaluating AI models on photo retrieval reveals significant limitations in their ability to find specific images from personal collections. The test presents models with what appears to be a simple task—locating a particular photo—yet results demonstrate the gap between general image recognition and practical personal image search.

February 22, 2026 · 11:35 AM2 min read

benchmark vision-models ai-limitations

via the-decoder.com ↗

product update

Google integrates Lyria 3 music generation into Gemini with text-to-music and cover art

Google Deepmind has integrated its Lyria 3 model into Gemini, enabling users to generate 30-second music tracks with vocals, lyrics, and cover art from text prompts or uploaded media. The model represents an expansion of Google's multimodal AI capabilities into creative audio generation.

February 20, 2026 · 4:38 AM2 min read

google-deepmind lyria music-generation

via the-decoder.com ↗

product update

Google rolls out Lyria 3 music generation to all Gemini app users

Google is rolling out Lyria 3, its music generation model, to all Gemini app users. The expansion follows recent releases of audio overviews, image generation, and video capabilities in the Gemini ecosystem.

February 20, 2026 · 4:37 AM2 min read

google gemini music-generation

via 9to5google.com ↗