Mistral Releases Mistral Large 3 with 675B Parameters and Three Ministral 3 Models Under Apache 2.0

TL;DR

Mistral AI has released Mistral 3, consisting of Mistral Large 3—a sparse mixture-of-experts model with 675B total parameters and 41B active parameters—and three Ministral 3 models at 3B, 8B, and 14B parameters. All models are released under the Apache 2.0 license with multimodal capabilities including image understanding.

May 28, 2026 · 9:54 AM2 min read

Mistral Large 3 — Quick Specs

Input$0.5/1M tokens

Output$1.5/1M tokens

Compare Mistral Large 3 with other models →

Mistral Releases Mistral Large 3 with 675B Parameters and Three Ministral 3 Models Under Apache 2.0

Mistral Large 3 Technical Specifications

Mistral Large 3 is a sparse mixture-of-experts architecture trained from scratch on 3,000 NVIDIA H200 GPUs. The model uses 41B active parameters and 675B total parameters, making it Mistral's first MoE model since the Mixtral series.

According to Mistral AI, the model ranks #2 in the open-source non-reasoning models category on the LMArena leaderboard (#6 among all open-source models overall). The company claims the instruction-tuned version achieves parity with the best instruction-tuned open-weight models on general prompts while demonstrating what it calls "best-in-class performance" on multilingual conversations in languages other than English and Chinese.

Both base and instruction fine-tuned versions are available under Apache 2.0. A reasoning variant is announced as coming soon.

Ministral 3 Series Details

The Ministral 3 series includes three model sizes: 3B, 8B, and 14B parameters. For each size, Mistral releases base, instruct, and reasoning variants—all with multimodal image understanding capabilities under Apache 2.0.

Mistral AI claims the Ministral 3 reasoning 14B variant achieves 85% accuracy on AIME 2025. The company states that instruct models "match or exceed the performance of comparable models while often producing an order of magnitude fewer tokens."

Infrastructure and Deployment

All Mistral 3 models were trained on NVIDIA Hopper GPUs with HBM3e memory. Mistral collaborated with NVIDIA, vLLM, and Red Hat to optimize deployment:

Mistral Large 3 can run on a single 8×A100 or 8×H100 node using vLLM
A checkpoint in NVFP4 format built with llm-compressor is available
NVIDIA integrated Blackwell attention and MoE kernels for the sparse architecture
Support for prefill/decode disaggregated serving and speculative decoding on GB200 NVL72
Ministral models optimized for NVIDIA DGX Spark, RTX PCs, and Jetson edge devices

Inference support is enabled through TensorRT-LLM and SGLang for the complete model family.

Availability

Mistral 3 is available immediately on Mistral AI Studio, Amazon Bedrock, Azure Foundry, Hugging Face, Modal, IBM WatsonX, OpenRouter, Fireworks, Unsloth AI, and Together AI. NVIDIA NIM and AWS SageMaker availability is listed as coming soon.

Pricing information has not been disclosed. Model documentation and research papers are available through Mistral AI's documentation hub and Hugging Face.

What This Means

Mistral Large 3's 675B parameter count with 41B active parameters positions it as one of the largest openly-licensed MoE models available. The Apache 2.0 license removes commercial restrictions that limit other "open" models. The simultaneous release of smaller Ministral variants (3B-14B) with reasoning capabilities addresses the growing demand for edge deployment and cost-efficient inference, though independent verification of Mistral's performance claims on multilingual tasks and token efficiency will be necessary to confirm competitive positioning.

Source: mistral.ai ↗

mistral-ai mistral-large-3 ministral-3 mixture-of-experts open-source apache-2.0 multimodal nvidia

model releaseJuly 9, 2026

NVIDIA Releases Audex-30B-A3B: Unified Audio-Text Model With 1M Token Context and Speech Generation

NVIDIA released Audex-30B-A3B, a unified audio-text model built on the Nemotron-Cascade-2-30B-A3B backbone. The model handles audio understanding, speech recognition and translation, text-to-speech, audio generation, and speech-to-speech while supporting up to 1M token context length.

model releaseJuly 11, 2026

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

Cohere and Cohere Labs released Cohere Transcribe Arabic, a 2B parameter automatic speech recognition model optimized for Arabic dialects and Arabic-English code-switching. The open-source model achieves a 25.9% average word error rate across major Arabic ASR benchmarks, outperforming models up to 30B parameters.

model releaseJuly 9, 2026

NVIDIA releases Nemotron-Labs-3-Puzzle-75B, compressed from 120B to 75B parameters with 2× throughput

NVIDIA has released Nemotron-Labs-3-Puzzle-75B-A9B, a compressed variant of Nemotron-3-Super that reduces the model from 120.7B total/12.8B active parameters to 75.3B total/9.3B active parameters. According to NVIDIA, the model achieves approximately 2× higher server throughput on a single 8×B200 node and increases sustainable 1M-token single-H100 concurrency from 1 request to 8 requests while maintaining strong accuracy across benchmarks.

model releaseJuly 10, 2026

Meta stock surges 15% as company releases Muse Spark 1.1 agentic model and Muse Image generator

Meta's stock surged 15% this week following the release of two AI models: Muse Spark 1.1 for agentic and coding workloads on Thursday, and Muse Image for image generation on Tuesday. The releases come three months after Meta introduced its first foundation model, Muse Spark, as the company competes with OpenAI, Anthropic, and Google.

Mistral Releases Mistral Large 3 with 675B Parameters and Three Ministral 3 Models Under Apache 2.0

Mistral Large 3 — Quick Specs

Mistral Releases Mistral Large 3 with 675B Parameters and Three Ministral 3 Models Under Apache 2.0

Mistral Large 3 Technical Specifications

Ministral 3 Series Details

Infrastructure and Deployment

Availability

What This Means

Related Articles

NVIDIA Releases Audex-30B-A3B: Unified Audio-Text Model With 1M Token Context and Speech Generation

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

NVIDIA releases Nemotron-Labs-3-Puzzle-75B, compressed from 120B to 75B parameters with 2× throughput

Meta stock surges 15% as company releases Muse Spark 1.1 agentic model and Muse Image generator

Comments