Mistral AI Releases Voxtral: Apache 2.0 Speech Models with 32K Token Context at $0.001/Minute

TL;DR

Mistral AI released Voxtral, a family of open-source speech understanding models available in 24B and 3B parameter variants under Apache 2.0 license. The models support up to 32K token context (30 minutes of audio for transcription, 40 minutes for understanding) and are priced at $0.001 per minute via API—less than half the cost of comparable proprietary systems according to Mistral.

May 28, 2026 · 9:51 AM2 min read

Voxtral Small — Quick Specs

Context window32K tokens

Compare Voxtral Small with other models →

Mistral AI Releases Voxtral: Apache 2.0 Speech Models with 32K Token Context at $0.001/Minute

Mistral AI released Voxtral, a family of open-source speech understanding models available in 24B and 3B parameter variants. Both models are released under Apache 2.0 license and available via API starting at $0.001 per minute.

Technical Specifications

Voxtral comes in two versions:

Voxtral Small (24B): Production-scale applications
Voxtral Mini (3B): Local and edge deployments

Both models support 32K token context length, handling up to 30 minutes of audio for transcription or 40 minutes for understanding tasks. The API uses Voxtral Mini Transcribe, an optimized transcription variant.

Core Capabilities

The models include built-in Q&A and summarization without requiring separate ASR and language model chains. Voxtral supports native multilingual processing with automatic language detection across English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian.

Voxtral enables function-calling directly from voice input, allowing systems to trigger backend functions or API calls based on spoken commands without intermediate parsing. The models retain the text understanding capabilities of their Mistral Small 3.1 language model backbone.

Benchmark Performance

According to Mistral AI, Voxtral outperforms Whisper large-v3 across all tested transcription tasks. The company claims Voxtral Small beats GPT-4o mini Transcribe and Gemini 2.5 Flash on all evaluated tasks, achieving state-of-the-art results on English short-form benchmarks and Mozilla Common Voice.

On the FLEURS multilingual benchmark, Mistral reports Voxtral Small surpasses Whisper on every language task, with particular strength in European languages. For audio understanding tasks, the company states Voxtral Small is competitive with GPT-4o-mini and Gemini 2.5 Flash, claiming state-of-the-art performance in speech translation.

Word error rates were measured across LibriSpeech, GigaSpeech, VoxPopuli, Switchboard, CHiME-4, SPGISpeech, and Earnings-21/22 datasets for English, plus Mozilla Common Voice 15.1 and FLEURS for multilingual evaluation.

Pricing and Availability

Mistral claims Voxtral Mini Transcribe outperforms OpenAI Whisper at less than half the price, while Voxtral Small matches ElevenLabs Scribe performance at less than half the cost. API pricing starts at $0.001 per minute.

Both models are available for download on Hugging Face. Voxtral will be integrated into Le Chat's voice mode over the coming weeks.

Enterprise Features

Mistral offers private deployment options for production-scale inference within customer infrastructure, including multi-GPU configurations and quantized builds. The company provides domain-specific fine-tuning services for legal, medical, and customer support applications.

Mistral is developing additional features including speaker segmentation, emotion detection, word-level timestamps, and non-speech audio recognition.

What This Means

Voxtral represents the first production-grade, open-source speech model with competitive benchmark performance against proprietary systems. The Apache 2.0 license and $0.001/minute pricing could significantly lower barriers for developers building voice-enabled applications, particularly in regulated industries requiring on-premises deployment. The 32K token context window addresses a key limitation in current open-source ASR systems for long-form audio processing.

Source: mistral.ai ↗

mistral-ai voxtral speech-recognition asr open-source apache-2.0 multimodal api

model releaseJuly 11, 2026

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

Cohere and Cohere Labs released Cohere Transcribe Arabic, a 2B parameter automatic speech recognition model optimized for Arabic dialects and Arabic-English code-switching. The open-source model achieves a 25.9% average word error rate across major Arabic ASR benchmarks, outperforming models up to 30B parameters.

model releaseJuly 9, 2026

OpenAI releases GPT-5.6 family in three sizes: Luna at $1/$6, Terra at $2.50/$15, Sol at $5/$30 per 1M tokens

OpenAI released its GPT-5.6 flagship model family in three sizes: Luna ($1/$6 per 1M tokens), Terra ($2.50/$15), and Sol ($5/$30). The company claims GPT-5.6 Sol scores 53.6 on the Agents' Last Exam benchmark, outperforming Claude Fable 5's score by 13.1 points.

model releaseJuly 9, 2026

OpenAI Releases GPT-5.6 Luna: $1/$6 Per 1M Tokens With 1M Context Window

OpenAI has released GPT-5.6 Luna, a fast and cost-efficient model in its GPT-5.6 series. The model features a 1 million token context window and is priced at $1 per 1M input tokens and $6 per 1M output tokens, with a knowledge cutoff of February 2026.

model releaseJuly 9, 2026

Meta Releases Muse Spark 1.1 with API Access and Enhanced Tool Calling

Meta has released Muse Spark 1.1, the first model in the Spark series to offer API access. The company claims significant improvements in agentic tool calling and computer use compared to the original Muse Spark released in April 2026.

Mistral AI Releases Voxtral: Apache 2.0 Speech Models with 32K Token Context at $0.001/Minute

Voxtral Small — Quick Specs

Mistral AI Releases Voxtral: Apache 2.0 Speech Models with 32K Token Context at $0.001/Minute

Technical Specifications

Core Capabilities

Benchmark Performance

Pricing and Availability

Enterprise Features

What This Means

Related Articles

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

OpenAI releases GPT-5.6 family in three sizes: Luna at $1/$6, Terra at $2.50/$15, Sol at $5/$30 per 1M tokens

OpenAI Releases GPT-5.6 Luna: $1/$6 Per 1M Tokens With 1M Context Window

Meta Releases Muse Spark 1.1 with API Access and Enhanced Tool Calling

Comments