model release

Meta launches Muse Spark, its first frontier model and first closed-weight AI system

TL;DR

Meta Superintelligence Labs has launched Muse Spark, a native multimodal reasoning model that scores 52 on the Artificial Analysis Intelligence Index, placing it in the top 5 frontier models. This marks Meta's first frontier-class model and its first AI system without open weights, representing a strategic shift from its open-source Llama strategy. The model achieves comparable efficiency to Gemini 3.1 Pro while matching Llama 4 Maverick capabilities with over an order of magnitude less compute.

4 min read
0

Meta launches Muse Spark, its first frontier model and first closed-weight system

Meta Superintelligence Labs has unveiled Muse Spark, a native multimodal reasoning model that marks two significant departures from the company's AI strategy: it's Meta's first frontier-class model and its first system without open weights.

Benchmark Performance

Muse Spark scored 52 on the Artificial Analysis Intelligence Index, landing in the top 5 across all tested models. Only Gemini 3.1 Pro Preview (top performer), GPT-5.4, and Claude Opus 4.6 scored higher. For context, Meta's previous models Llama 4 Maverick and Scout achieved only 18 and 13 points respectively when they launched in April 2025.

Independent testing by Artificial Analysis shows the model closing the frontier gap in a single release. However, Artificial Analysis flagged weakness in agent-based tasks: on the GDPval-AA work task benchmark, Muse Spark scored 1,427 points versus Claude Sonnet 4.6's 1,648 and GPT-5.4's 1,676.

On Meta's internal testing, Muse Spark achieved 58% on Humanity's Last Exam and 38% on FrontierScience Research. In extended thinking mode without tools, it scored 50.2 on Humanity's Last Exam (No Tools), outperforming both Gemini 3.1 and GPT-5.4 Pro in this specific benchmark.

Key Capabilities and Architecture

Muse Spark operates as a native multimodal model with three core capabilities: tool usage, visual chain-of-thought reasoning, and multi-agent orchestration. The model includes a "Contemplating Mode" designed to compete with deep reasoning features in competing frontier models like Gemini Deep Think and GPT Pro.

Meta rebuilt the pretraining stack from the ground up over nine months, implementing changes to model architecture, optimization, and data curation. According to Meta's claims, Muse Spark matches Llama 4 Maverick's capabilities using over an order of magnitude less compute, positioning it as substantially more efficient than competing base models.

The company employs two approaches to test-time compute. The first uses thought-time penalties that optimize token consumption. Meta observed a phenomenon it calls "thought compression," where the model initially improves by thinking longer, then compresses reasoning to solve problems with fewer tokens before expanding solutions again for stronger results. The second approach uses multi-agent orchestration—deploying multiple parallel agents on difficult problems simultaneously—to boost performance without adding latency.

Artificial Analysis verified efficiency claims: Muse Spark consumed 58 million output tokens for the full Intelligence Index run, matching Gemini 3.1 Pro Preview (57 million) and well below Claude Opus 4.6 (157 million) or GPT-5.4 (120 million).

Closed Weights Mark Strategic Shift

Unlike the Llama family, Muse Spark is not open-weight and cannot run locally. This represents a sharp break from Meta's open-source playbook championed for years. Meta's AI chief Alexandr Wang stated the company has "plans to open-source future versions," suggesting closed weights may not be permanent policy. The company is also reportedly planning to open-source parts of its new AI models.

Meta justified the shift by noting its enormous spending on AI infrastructure and specialized talent "has to start paying for itself eventually."

Health and Multimodal Focus

Meta partnered with over 1,000 doctors to curate high-quality, factually accurate training data for health applications. The model can generate interactive displays breaking down nutritional value of food or showing which muscles activate during specific exercises. Meta emphasized multimodal perception and health as primary use cases, though interactive applications like mini-game generation are also possible.

Meta acknowledged performance gaps in long-horizon agentic systems and coding workflows. The company also flagged that Muse Spark frequently labeled test scenarios as "alignment traps" during security evaluation, demonstrating "evaluation awareness"—a phenomenon where models appear to recognize they're being tested.

Availability and Future Plans

Muse Spark is live on meta.ai and in the Meta AI app, with private API preview access going to select users. Pricing has not been disclosed.

Meta frames Muse Spark as "the first step on our scaling ladder and the first product of a ground-up overhaul of our AI efforts" toward "personal superintelligence." The company stated "bigger models are already in development with infrastructure scaling to match." This release follows a rough period for Meta's AI efforts after Llama 4 Maverick and Scout drew criticism in April 2025 for underwhelming benchmark results and internal accusations of benchmark manipulation.

What This Means

Muse Spark demonstrates Meta can compete at the frontier in a single leap, closing a gap that seemed substantial just months ago. However, persistent weaknesses in agentic tasks and the company's admission of gaps in coding workflows suggest the model may not be immediately ready for autonomous agent deployment. The shift to closed weights is pragmatic—Meta's infrastructure spending demands commercial revenue—but the stated commitment to open-sourcing future versions leaves the door open to returning to its original strategy. Real-world performance across extended reasoning tasks will be the critical test; benchmark scores alone may not reflect usability in production environments.

Related Articles

model release

Meta launches Muse Spark, its first model from revamped AI labs

Meta Superintelligence Labs has launched Muse Spark, its first model since Mark Zuckerberg restructured the company's AI division. The multimodal model now powers Meta AI's app and website in the US, with rollout planned for WhatsApp, Instagram, Facebook, Messenger, and Meta's smart glasses in coming weeks.

model release

Meta launches Muse Spark, proprietary AI model built by Wang's Superintelligence Labs

Meta announced Muse Spark, its first major large language model since hiring Scale AI's Alexandr Wang nine months ago for a $14.3 billion deal. The proprietary model emphasizes efficiency and multimodal reasoning over top-tier performance, marking a strategic shift from Meta's previous open-source Llama approach. Muse Spark will power Meta's AI assistant across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban glasses starting in coming weeks.

model release

Google DeepMind releases Gemma 4 with four model sizes, up to 256K context, multimodal support

Google DeepMind released Gemma 4, an open-weights multimodal model family in four sizes (2.3B to 31B parameters) with context windows up to 256K tokens. All models support text and image input, with audio native to E2B and E4B variants. The Gemma 4 31B dense model scores 85.2% on MMLU Pro, 89.2% on AIME 2026, and 80.0% on LiveCodeBench—significant improvements over Gemma 3.

model release

Meta replaces Llama with Muse Spark AI, launches Contemplating mode for complex reasoning

Meta has discontinued its Llama model line and launched Muse Spark as the foundation of its new AI strategy under Meta Superintelligence Labs. The model features a Contemplating mode for complex reasoning tasks and specializes in multimodal perception, health applications, and agentic tasks. Muse Spark is available today in Meta AI apps, with a private API preview for select partners.

Comments

Loading...