Stability AI Releases Stable Audio 3 Medium: 2B-Parameter Audio Generation Model with 180-Second Output in Under 2 Secon

TL;DR

Stability AI has released Stable Audio 3 Medium, a 2 billion parameter latent diffusion model capable of generating variable-length audio up to 380 seconds. The model generates music and sound effects in less than 2 seconds on an H200 GPU, trained on 1.28 million licensed and Creative Commons audio recordings.

May 24, 2026 · 1:05 AM2 min read

Stable Audio 3 Medium — Quick Specs

Compare Stable Audio 3 Medium with other models →

Stability AI Releases Stable Audio 3 Medium: 2B-Parameter Audio Generation Model

Stability AI has released Stable Audio 3 Medium, a 2 billion parameter latent diffusion model that generates music and sound effects in variable lengths up to 380 seconds (6+ minutes). According to Stability AI, the model produces audio in under 2 seconds on an H200 GPU and "a few seconds" on a MacBook Pro M4.

Stable Audio 3 is the medium version in a three-tier family (small, medium, large) of fast latent diffusion models designed for consumer-grade hardware deployment.

Technical Architecture

The model operates on a novel semantic-acoustic autoencoder that compresses audio into a compact latent space, enabling efficient generation while preserving audio fidelity. Stability AI claims the architecture encourages semantic structure in the latent representation.

The model underwent adversarial post-training to reduce inference steps while improving generation quality and prompt adherence. It requires 8 diffusion steps at inference time using a "pingpong" sampler, with a CFG scale of 1.0.

Text conditioning uses Google's pre-trained T5Gemma model (t5gemma-b-b-ul2), which is redistributed under separate Gemma Terms of Use.

Training Data

The model was trained on 1,278,902 audio recordings:

806,284 recordings licensed from AudioSparx
472,618 recordings from Freesound (266,324 CC-0, 194,840 CC-BY, 11,454 CC-Sampling+)

Stability AI reports that music recordings in the Freesound portion were identified using PANNs tagging and sent to a content detection company to verify the absence of copyrighted material. All identified copyrighted content was removed.

Key Capabilities

The model supports:

Variable-length audio generation (up to 380+ seconds demonstrated)
Audio inpainting for targeted editing
Continuation of short recordings
BPM-specific music generation
Style and mood control through text prompts

Availability and Licensing

Stable Audio 3 Medium is available on Hugging Face under the Stability AI Community License. Commercial use requires a separate license from Stability AI. The model requires users to accept both the Stability AI license and Gemma Terms of Use, including use restrictions in Section 3.2.

Inference code is available through two libraries: the stable-audio-3 inference library and the stable-audio-tools research library. The model weights are distributed in FP32 format.

What This Means

Stable Audio 3 Medium represents a significant step in accessible audio generation, with claimed sub-2-second generation times that could enable real-time workflows for sound design and music production. The 2B parameter size positions it as deployable on consumer hardware, though actual performance will depend on available GPU memory and compute. The variable-length generation capability addresses a key limitation of fixed-length audio models, reducing computational waste for short sound effects. However, commercial users should note the dual licensing requirement and review Section 3.2 restrictions in the Gemma terms before deployment.

Source: huggingface.co ↗

Stability AI audio generation latent diffusion text-to-audio multimodal AI music generation sound effects T5Gemma

model releaseJuly 7, 2026

Meta launches Muse Image, a free AI image generator integrated across Instagram, WhatsApp, and Facebook Marketplace

Meta has launched Muse Image, a new AI image generator from its Meta Superintelligence Labs division. The model is available free for Instagram Stories, WhatsApp, and the Meta AI app, with integration into Facebook Marketplace for visualizing used furniture in home settings.

model releaseJuly 8, 2026

Trump Administration Clears OpenAI GPT-5.6 for Broad Release After Month-Long Testing

The U.S. Department of Commerce has approved OpenAI for a broad launch of GPT-5.6 this week, following testing by the Center for AI Standards and Innovation. The model was previously restricted to government-approved entities since June, mirroring earlier restrictions placed on Anthropic's models.

model releaseJuly 8, 2026

Poolside releases Laguna XS 2.1: 33B parameter MoE coding model with 262K context window

Poolside has released Laguna XS 2.1, a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token and a 262,144-token context window. The model achieves 70.9% on SWE-bench Verified and 63.1% on SWE-bench Multilingual, representing a 5.4% improvement over its predecessor on multilingual coding tasks.

model releaseJuly 7, 2026

Meta launches Muse Image model with Instagram account prompts and QR code generation

Meta has launched Muse Image, the first AI image generation model from Meta Superintelligence Labs, now available in the US through Meta AI app, Instagram, and WhatsApp. The model accepts Instagram accounts as prompts to incorporate users' likenesses and claims to generate functional QR codes with legible styled text.

Stability AI Releases Stable Audio 3 Medium: 2B-Parameter Audio Generation Model with 180-Second Output in Under 2 Secon

Stable Audio 3 Medium — Quick Specs

Stability AI Releases Stable Audio 3 Medium: 2B-Parameter Audio Generation Model

Technical Architecture

Training Data

Key Capabilities

Availability and Licensing

What This Means

Related Articles

Meta launches Muse Image, a free AI image generator integrated across Instagram, WhatsApp, and Facebook Marketplace

Trump Administration Clears OpenAI GPT-5.6 for Broad Release After Month-Long Testing

Poolside releases Laguna XS 2.1: 33B parameter MoE coding model with 262K context window

Meta launches Muse Image model with Instagram account prompts and QR code generation

Comments