Stability AI releases Stable Audio Open Small for on-device audio generation with Arm
Stability AI has open-sourced Stable Audio Open Small in partnership with Arm, a smaller and faster variant of its text-to-audio model designed for on-device deployment. The model maintains output quality and prompt adherence while reducing computational requirements for real-world edge deployment on devices powered by Arm's technology, which runs on 99% of smartphones globally.
Stability AI Releases Stable Audio Open Small for On-Device Deployment
Stability AI has open-sourced Stable Audio Open Small, a compact audio generation model developed in partnership with Arm, targeting real-world deployment on edge devices and smartphones.
Key Details
Stable Audio Open Small represents a size-optimized variant of Stability AI's existing Stable Audio Open text-to-audio model. The partnership leverages Arm's processor architecture, which according to Stability AI powers 99% of smartphones globally.
The model maintains the core capabilities of the full Stable Audio Open while reducing model size and inference latency. Stability AI claims the variant preserves output quality and maintains adherence to text prompts despite the architectural compression.
No specific model parameters, file size, inference speed benchmarks, or inference cost metrics have been disclosed at launch. The pricing structure for commercial use remains undisclosed.
Technical Approach
The release focuses on addressing a key constraint in audio generation: computational efficiency on resource-limited devices. By optimizing Stable Audio Open for Arm processors, the model can theoretically run directly on smartphones and edge devices without requiring cloud inference.
The open-source release suggests this is not a proprietary commercial variant. Developers can integrate the model into applications that require on-device audio generation without relying on external API calls.
Market Context
Audio generation remains an underdeveloped area compared to text and image generation. Most commercial audio models (including OpenAI's text-to-speech API and ElevenLabs) require cloud-based inference. On-device audio generation has seen limited adoption outside specialized applications.
Stability AI's focus on open-sourcing the model aligns with its broader strategy of releasing foundation models freely to developers, contrasting with the closed commercial approaches of competitors like OpenAI and Anthropic.
What This Means
Stable Audio Open Small enables developers to build audio generation features directly into mobile applications without external dependencies. This reduces latency, improves privacy (processing stays on-device), and removes per-inference costs. However, the practical impact depends on undisclosed details: actual model size, latency benchmarks, audio quality metrics, and real-world performance on typical smartphone hardware. The 99% Arm smartphone penetration claim suggests broad potential reach, but device-level variable performance may limit deployment across older or lower-end devices. Whether this model achieves production-grade audio quality comparable to cloud-based systems remains unconfirmed.
Related Articles
IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support
IBM released Granite Embedding 311M Multilingual R2, a 311-million parameter dense embedding model with 32,768-token context length and support for 200+ languages. The model scores 64.0 on Multilingual MTEB Retrieval (18 tasks), an 11.8-point improvement over its predecessor, and ships with ONNX and OpenVINO models for production deployment.
IBM releases Apache 2.0 Granite 4.1 LLMs in 3B, 8B, and 30B sizes
IBM has released the Granite 4.1 family of language models under Apache 2.0 license. The models come in 3B, 8B, and 30B parameter sizes. Unsloth has released 21 GGUF quantized variants of the 3B model ranging from 1.2GB to 6.34GB.
IBM Releases Granite 4.1 30B With 131K Context Window and Enhanced Tool-Calling
IBM released Granite 4.1 30B, a 30-billion parameter instruction-following model with a 131,072 token context window. The model scores 80.16 on MMLU 5-shot and 88.41 on HumanEval pass@1, with enhanced tool-calling capabilities following OpenAI's function definition schema.
IBM Releases Granite 4.1 8B with 131K Context Window at $0.05/M Input Tokens
IBM has released Granite 4.1 8B, an 8-billion-parameter decoder-only language model with a 131,072-token context window. The model supports 12 languages and costs $0.05 per million input tokens and $0.10 per million output tokens, available under the Apache 2.0 license.
Comments
Loading...