multimodal AI

8 articles tagged with multimodal AI

July 7, 2026

model release

Meta launches Muse Image, a free AI image generator integrated across Instagram, WhatsApp, and Facebook Marketplace

Meta has launched Muse Image, a new AI image generator from its Meta Superintelligence Labs division. The model is available free for Instagram Stories, WhatsApp, and the Meta AI app, with integration into Facebook Marketplace for visualizing used furniture in home settings.

July 7, 2026 · 10:35 PM

May 24, 2026

model releaseStability AI

Stability AI Releases Stable Audio 3 Medium: 2B-Parameter Audio Generation Model with 180-Second Output in Under 2 Secon

Stability AI has released Stable Audio 3 Medium, a 2 billion parameter latent diffusion model capable of generating variable-length audio up to 380 seconds. The model generates music and sound effects in less than 2 seconds on an H200 GPU, trained on 1.28 million licensed and Creative Commons audio recordings.

May 24, 2026 · 1:05 AM

May 22, 2026

model release

Google releases Gemini 3.5 Flash and autonomous agent Gemini Spark at I/O 2026

Google announced Gemini 3.5 Flash and Gemini Spark at I/O 2026. Gemini 3.5 Flash now powers Google's AI Mode search, while Spark is a cloud-based autonomous agent that can monitor credit card statements, track emails, and interact with third-party services like OpenTable and Instacart.

May 22, 2026 · 11:21 AM

May 12, 2026

product update

Meta AI app adds voice conversations and live camera features powered by Muse Spark

Meta has added voice conversation capabilities and live camera analysis to its Meta AI app, both powered by the company's Muse Spark model. The features previously available only on Meta's AI glasses are now rolling out to the standalone app, which ranks fourth on the U.S. iPhone App Store.

May 12, 2026 · 7:50 PM

April 28, 2026

product updateAmazon Web Services

Amazon Nova 2 Sonic Unifies Speech Recognition, Reasoning, and TTS in Single Streaming Model

Amazon Web Services released technical guidance for migrating text agents to voice assistants using Amazon Nova 2 Sonic, a native speech-to-speech model that combines automatic speech recognition, reasoning, tool calling, and text-to-speech in a single bidirectional streaming interface. The model supports asynchronous tool calling and built-in voice activity detection for handling interruptions.

April 28, 2026 · 6:06 PM

April 24, 2026

product updateOpenAI

OpenAI releases ChatGPT Images 2.0 with accurate text rendering and brand-style matching

OpenAI launched ChatGPT Images 2.0, upgrading from decorative images to full-page graphics with detailed text rendering. The update is available to all ChatGPT tiers, with advanced features requiring paid subscriptions that access the Thinking model. Hands-on testing shows significant improvements in text accuracy and brand-style replication, though factual errors still occur.

April 24, 2026 · 1:36 PM

April 21, 2026

model releaseOpenAI

OpenAI releases ChatGPT Images 2.0 with integrated reasoning and text-image composition

OpenAI has released ChatGPT Images 2.0, which integrates reasoning capabilities to generate complex visual compositions combining text and images. The model supports aspect ratios from 3:1 to 1:3 and outputs up to 2K resolution, with advanced features available to Plus, Pro, Business, and Enterprise users.

April 21, 2026 · 7:21 PM

April 8, 2026

model release

Meta launches proprietary Muse Spark model, abandoning open-source commitment

Meta has released Muse Spark, a proprietary AI model with restricted access via API and portal invite only—a striking reversal from CEO Mark Zuckerberg's 2024 manifesto championing open-source AI. The model claims performance matching top competitors from OpenAI, Anthropic, and Google, trained with an order of magnitude less compute than Llama 4.

April 8, 2026 · 11:20 PM

← Back to all news