model release

Meta launches Muse Spark model with private API preview and 16 integrated tools

TL;DR

Meta announced Muse Spark today, its first model release since Llama 4 a year ago. The hosted model is available in private API preview and on meta.ai with Instant and Thinking modes, benchmarking competitively against Anthropic's Opus 4.6 and Google's Gemini 3.1 Pro, though behind on Terminal-Bench 2.0.

3 min read
0

Meta Launches Muse Spark: First Major Model Since Llama 4

Meta announced Muse Spark today, marking its first significant model release since Llama 4 in April 2025. The model is available as a private API preview to select users and on meta.ai's chat interface, which requires Facebook or Instagram login.

Performance and Positioning

Meta's self-reported benchmarks show Muse Spark performing at parity with Anthropic's Opus 4.6, Google's Gemini 3.1 Pro, and OpenAI's GPT 5.4 on selected metrics. However, the model notably underperforms on Terminal-Bench 2.0. Meta explicitly acknowledges performance gaps in long-horizon agentic systems and coding workflows, stating they "continue to invest in areas with current performance gaps."

Deployment Modes

Muse Spark is accessible through two distinct modes on meta.ai:

Instant Mode: Faster inference with direct SVG/HTML output capabilities. Testing showed this mode generates SVG code directly with embedded comments.

Thinking Mode: Extended reasoning with wrapped output and enhanced problem-solving. Early benchmarking demonstrates notably better performance on complex tasks compared to Instant mode.

Meta plans a future Contemplating Mode offering significantly longer reasoning times, comparable to Google's Gemini Deep Think and OpenAI's GPT-5.4 Pro.

16 Integrated Tools Exposed

Meta's chat interface includes 16 publicly accessible tools. Notably, Meta did not restrict disclosure of these tools, allowing direct enumeration without workarounds.

Web Capabilities:

  • browser.search: Web search through undisclosed engine
  • browser.open: Full page retrieval from search results
  • browser.find: Pattern matching on page content

Meta Platform Integration:

  • meta_1p.content_search: Semantic search across Instagram, Threads, and Facebook posts (user-accessible content only, post-2025-01-01)
  • meta_1p.meta_catalog_search: Product catalog search
  • container.download_meta_1p_media: Pull media from Meta platforms into sandbox

Code Execution and Artifacts:

  • container.python_execution: Code Interpreter running Python 3.9.25 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, and OpenCV. Note: Python 3.9 reached end-of-life status in October 2024.
  • container.create_web_artifact: HTML/JavaScript sandboxed rendering (Claude Artifacts-style)
  • container.visual_grounding: Image analysis with object detection, bounding box generation, point localization, and object counting

File Operations:

  • container.view, container.insert, container.str_replace: Text editor commands mirroring Claude's implementation
  • container.file_search: Semantic search across uploaded files

Media and Agents:

  • media.image_gen: Image generation with "artistic" and "realistic" modes; returns "square", "vertical", or "landscape" formats
  • subagents.spawn_agent: Sub-agent spawning for delegated research and analysis
  • third_party.link_third_party_account: Account linking for Google Calendar, Outlook Calendar, Gmail, Outlook

Technical Observations

Direct testing revealed Python 3.9.25 and SQLite 3.34.1 (January 2021 release) running in the sandbox environment. The Thinking mode's tendency to wrap SVG in HTML shells with Playables SDK v1.0.0 JavaScript libraries suggests integration with Meta's interactive content framework.

What This Means

Muse Spark positions Meta as a credible competitor in the hosted LLM space, though the private preview status limits immediate adoption. The 16-tool architecture demonstrates Meta's integration strategy: leveraging Facebook, Instagram, and Threads data access as competitive moat while matching feature parity with Claude and ChatGPT. The acknowledged gaps in agentic systems and coding—combined with Python 3.9 EOL tooling—suggest this release prioritizes chat functionality over agent-grade reliability. The planned Contemplating mode indicates Meta is responding directly to Deep Research and Pro reasoning mode adoption patterns.

Related Articles

model release

Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context

Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.

model release

Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU

Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.

model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

Comments

Loading...