Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0

TL;DR

Mistral AI has released Mistral Small 4, a 119B total parameter mixture-of-experts model with 256K context window and native multimodal capabilities. The model uses 128 experts with 4 active per token (6B active parameters) and is released under the Apache 2.0 license, marking Mistral's first unified model combining reasoning, multimodal, and coding capabilities.

May 28, 2026 · 10:07 AM2 min read

Mistral Small 4 — Quick Specs

Context window256K tokens

Compare Mistral Small 4 with other models →

Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0

Mistral AI has released Mistral Small 4, a 119B total parameter mixture-of-experts (MoE) model with 256K context window and native multimodal capabilities. The model uses 128 experts with 4 active per token (6B active parameters, 8B including embedding and output layers) and is released under the Apache 2.0 license.

Architecture and Specifications

Mistral Small 4 employs a mixture-of-experts architecture with 128 total experts and 4 active per token. The model has 119B total parameters with 6B active parameters per token. According to Mistral AI, the model supports a 256K context window and accepts both text and image inputs.

The model includes a configurable reasoning_effort parameter that allows users to toggle between fast responses (reasoning_effort="none") equivalent to Mistral Small 3.2's chat style, and deep reasoning mode (reasoning_effort="high") with step-by-step analysis similar to previous Magistral models.

Performance Claims

Mistral AI claims a 40% reduction in end-to-end completion time in latency-optimized setups and 3x more requests per second in throughput-optimized configurations compared to Mistral Small 3. The company states the model achieves competitive scores on benchmarks while generating significantly shorter outputs than comparable models.

On the AA LCR benchmark, Mistral AI reports a score of 0.72 with 1.6K characters of output, compared to Qwen models requiring 5.8-6.1K characters for comparable performance. On LiveCodeBench, the company claims the model outperforms GPT-OSS 120B while producing 20% less output.

Hardware Requirements

Minimum infrastructure requirements:

4x NVIDIA HGX H100, or
2x NVIDIA HGX H200, or
1x NVIDIA DGX B200

Recommended setup for optimal performance:

4x NVIDIA HGX H100, or
4x NVIDIA HGX H200, or
2x NVIDIA DGX B200

Availability and Deployment

The model is available immediately on Mistral API, AI Studio, and Hugging Face under the Apache 2.0 license. It supports inference frameworks including vLLM, llama.cpp, SGLang, and Transformers. The model is available as an NVIDIA NIM for production deployment and can be customized with NVIDIA NeMo for domain-specific fine-tuning.

Mistral AI has joined the NVIDIA Nemotron Coalition as a founding member and collaborated with NVIDIA on inference optimization for vLLM and SGLang.

What This Means

Mistral Small 4 represents the first major open-source model to unify reasoning, multimodal, and coding capabilities in a single release under a permissive license. The 256K context window and MoE architecture with 6B active parameters position it as a deployment-friendly alternative to dense models requiring more compute per token. The Apache 2.0 license allows commercial use and fine-tuning without restrictions, though real-world performance claims will need independent verification across diverse workloads. The configurable reasoning mode is a notable feature that could reduce the need for maintaining separate model deployments for different task types.

Source: mistral.ai ↗

mistral-ai open-source mixture-of-experts multimodal apache-2.0 nvidia reasoning 256k-context

model releaseJuly 9, 2026

NVIDIA Releases Audex-30B-A3B: Unified Audio-Text Model With 1M Token Context and Speech Generation

NVIDIA released Audex-30B-A3B, a unified audio-text model built on the Nemotron-Cascade-2-30B-A3B backbone. The model handles audio understanding, speech recognition and translation, text-to-speech, audio generation, and speech-to-speech while supporting up to 1M token context length.

model releaseJuly 11, 2026

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

Cohere and Cohere Labs released Cohere Transcribe Arabic, a 2B parameter automatic speech recognition model optimized for Arabic dialects and Arabic-English code-switching. The open-source model achieves a 25.9% average word error rate across major Arabic ASR benchmarks, outperforming models up to 30B parameters.

model releaseJuly 9, 2026

OpenAI Releases GPT-5.6 Luna Pro with Extended Reasoning Mode at $1/$6 Per Million Tokens

OpenAI has released GPT-5.6 Luna Pro, a reasoning-enhanced variant of GPT-5.6 Luna with a 1 million token context window. The model is priced at $1 per million input tokens and $6 per million output tokens, with a knowledge cutoff date of February 2026.

model releaseJuly 9, 2026

OpenAI Releases GPT-5.6 Terra Pro with Enhanced Reasoning Mode at $2.50/$15 Per Million Tokens

OpenAI has released GPT-5.6 Terra Pro, a variant of GPT-5.6 Terra configured with enhanced reasoning capabilities for complex tasks. The model features a 1 million token context window and is priced at $2.50 per million input tokens and $15 per million output tokens.

Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0

Mistral Small 4 — Quick Specs

Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0

Architecture and Specifications

Performance Claims

Hardware Requirements

Availability and Deployment

What This Means

Related Articles

NVIDIA Releases Audex-30B-A3B: Unified Audio-Text Model With 1M Token Context and Speech Generation

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

OpenAI Releases GPT-5.6 Luna Pro with Extended Reasoning Mode at $1/$6 Per Million Tokens

OpenAI Releases GPT-5.6 Terra Pro with Enhanced Reasoning Mode at $2.50/$15 Per Million Tokens

Comments