multimodal-ai

9 articles tagged with multimodal-ai

April 10, 2026
model release

Meta launches proprietary Muse Spark, abandoning open-source strategy after $14.3B rebuild

Meta launched Muse Spark on April 8, 2026, a natively multimodal reasoning model with tool-use and visual chain-of-thought capabilities. Unlike Llama, it is entirely proprietary with no open weights. The model scores 52 on AI Index v4.0 and excels on health benchmarks but represents Meta's departure from its open-source identity.

April 8, 2026
model release

Meta replaces Llama with Muse Spark AI, launches Contemplating mode for complex reasoning

Meta has discontinued its Llama model line and launched Muse Spark as the foundation of its new AI strategy under Meta Superintelligence Labs. The model features a Contemplating mode for complex reasoning tasks and specializes in multimodal perception, health applications, and agentic tasks. Muse Spark is available today in Meta AI apps, with a private API preview for select partners.

model release

Meta launches Muse Spark, proprietary AI model built by Wang's Superintelligence Labs

Meta announced Muse Spark, its first major large language model since hiring Scale AI's Alexandr Wang nine months ago for a $14.3 billion deal. The proprietary model emphasizes efficiency and multimodal reasoning over top-tier performance, marking a strategic shift from Meta's previous open-source Llama approach. Muse Spark will power Meta's AI assistant across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban glasses starting in coming weeks.

March 27, 2026
research

Meta's TRIBE v2 AI predicts brain activity from images, audio, and speech with 70,000-voxel fMRI mapping

Meta's FAIR lab released TRIBE v2, an AI model that predicts human brain activity from images, audio, and text. Trained on over 1,000 hours of fMRI data from 720 subjects, the model maps predictions to 70,000 voxels and often matches group-average brain responses more accurately than individual brain scans.

March 26, 2026
product update

Google rolls out Search Live globally with Gemini 3.1 Flash Live model

Google has begun globally rolling out Search Live, enabling users in 200+ countries and territories to point their phone camera at objects and ask questions about what they see. The expansion is powered by Google's Gemini 3.1 Flash Live model, designed to be natively multilingual with faster, more reliable performance.

February 24, 2026
researchApple

Apple Research Identifies 'Text-Speech Understanding Gap' Limiting LLM Speech Performance

Apple researchers have identified a fundamental limitation in speech-adapted large language models: they consistently underperform their text-based counterparts on language understanding tasks. The team terms this the 'text-speech understanding gap' and documents that speech-adapted LLMs lag behind both their original text versions and cascaded speech-to-text pipelines.

February 22, 2026
benchmark

New benchmark reveals AI models struggle with personal photo retrieval tasks

A new benchmark evaluating AI models on photo retrieval reveals significant limitations in their ability to find specific images from personal collections. The test presents models with what appears to be a simple task—locating a particular photo—yet results demonstrate the gap between general image recognition and practical personal image search.

February 20, 2026
product update

Google integrates Lyria 3 music generation into Gemini with text-to-music and cover art

Google Deepmind has integrated its Lyria 3 model into Gemini, enabling users to generate 30-second music tracks with vocals, lyrics, and cover art from text prompts or uploaded media. The model represents an expansion of Google's multimodal AI capabilities into creative audio generation.

product update

Google rolls out Lyria 3 music generation to all Gemini app users

Google is rolling out Lyria 3, its music generation model, to all Gemini app users. The expansion follows recent releases of audio overviews, image generation, and video capabilities in the Gemini ecosystem.