Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0
Mistral AI has released Mistral Small 4, a 119B total parameter mixture-of-experts model with 256K context window and native multimodal capabilities. The model uses 128 experts with 4 active per token (6B active parameters) and is released under the Apache 2.0 license, marking Mistral's first unified model combining reasoning, multimodal, and coding capabilities.
Mistral AI Releases Small 4: 119B Parameter Open-Source Model with 256K Context Under Apache 2.0
Mistral AI has released Mistral Small 4, a 119B total parameter mixture-of-experts (MoE) model with 256K context window and native multimodal capabilities. The model uses 128 experts with 4 active per token (6B active parameters, 8B including embedding and output layers) and is released under the Apache 2.0 license.
Architecture and Specifications
Mistral Small 4 employs a mixture-of-experts architecture with 128 total experts and 4 active per token. The model has 119B total parameters with 6B active parameters per token. According to Mistral AI, the model supports a 256K context window and accepts both text and image inputs.
The model includes a configurable reasoning_effort parameter that allows users to toggle between fast responses (reasoning_effort="none") equivalent to Mistral Small 3.2's chat style, and deep reasoning mode (reasoning_effort="high") with step-by-step analysis similar to previous Magistral models.
Performance Claims
Mistral AI claims a 40% reduction in end-to-end completion time in latency-optimized setups and 3x more requests per second in throughput-optimized configurations compared to Mistral Small 3. The company states the model achieves competitive scores on benchmarks while generating significantly shorter outputs than comparable models.
On the AA LCR benchmark, Mistral AI reports a score of 0.72 with 1.6K characters of output, compared to Qwen models requiring 5.8-6.1K characters for comparable performance. On LiveCodeBench, the company claims the model outperforms GPT-OSS 120B while producing 20% less output.
Hardware Requirements
Minimum infrastructure requirements:
- 4x NVIDIA HGX H100, or
- 2x NVIDIA HGX H200, or
- 1x NVIDIA DGX B200
Recommended setup for optimal performance:
- 4x NVIDIA HGX H100, or
- 4x NVIDIA HGX H200, or
- 2x NVIDIA DGX B200
Availability and Deployment
The model is available immediately on Mistral API, AI Studio, and Hugging Face under the Apache 2.0 license. It supports inference frameworks including vLLM, llama.cpp, SGLang, and Transformers. The model is available as an NVIDIA NIM for production deployment and can be customized with NVIDIA NeMo for domain-specific fine-tuning.
Mistral AI has joined the NVIDIA Nemotron Coalition as a founding member and collaborated with NVIDIA on inference optimization for vLLM and SGLang.
What This Means
Mistral Small 4 represents the first major open-source model to unify reasoning, multimodal, and coding capabilities in a single release under a permissive license. The 256K context window and MoE architecture with 6B active parameters position it as a deployment-friendly alternative to dense models requiring more compute per token. The Apache 2.0 license allows commercial use and fine-tuning without restrictions, though real-world performance claims will need independent verification across diverse workloads. The configurable reasoning mode is a notable feature that could reduce the need for maintaining separate model deployments for different task types.
Related Articles
Mistral Releases Mistral Large 3 with 675B Parameters and Three Ministral 3 Models Under Apache 2.0
Mistral AI has released Mistral 3, consisting of Mistral Large 3—a sparse mixture-of-experts model with 675B total parameters and 41B active parameters—and three Ministral 3 models at 3B, 8B, and 14B parameters. All models are released under the Apache 2.0 license with multimodal capabilities including image understanding.
Mistral AI Releases Voxtral: Apache 2.0 Speech Models with 32K Token Context at $0.001/Minute
Mistral AI released Voxtral, a family of open-source speech understanding models available in 24B and 3B parameter variants under Apache 2.0 license. The models support up to 32K token context (30 minutes of audio for transcription, 40 minutes for understanding) and are priced at $0.001 per minute via API—less than half the cost of comparable proprietary systems according to Mistral.
Mistral releases Leanstral, 6B-parameter open-source model for Lean 4 formal proof verification
Mistral AI released Leanstral, the first open-source code agent designed specifically for Lean 4 formal proof verification. The model uses 6B active parameters in a sparse 120B architecture and is available under Apache 2.0 license with free API access.
Mistral releases Devstral Medium and Small 1.1 with 61.6% SWE-Bench Verified score
Mistral AI has released two specialized coding models: Devstral Medium, achieving 61.6% on SWE-Bench Verified, and Devstral Small 1.1, scoring 53.6% and released under Apache 2.0 license. The company claims Devstral Medium surpasses Gemini 2.5 Pro and GPT-4.1 at a quarter of the price.
Comments
Loading...