Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

TL;DR

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

June 18, 2026 · 8:53 AM2 min read

Mistral Large 3 — Quick Specs

Input$0.5/1M tokens

Output$1.5/1M tokens

Compare Mistral Large 3 with other models →

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, a model family spanning from 3B to 675B parameters, all under the Apache 2.0 license. The release includes Mistral Large 3, a sparse mixture-of-experts architecture with 41B active parameters and 675B total parameters, alongside three Ministral 3 edge models at 3B, 8B, and 14B sizes.

Mistral Large 3 Specifications

Mistral Large 3 was trained from scratch on 3,000 NVIDIA H200 GPUs. According to Mistral, the model ranks #2 among open-source non-reasoning models on LMArena and #6 among all open-source models. The company claims the model achieves "parity with the best instruction-tuned open-weight models" on general prompts.

Pricing for Mistral Large 3:

Input: $0.50 per 1M tokens
Output: $1.50 per 1M tokens

Both base and instruction-tuned versions are available. A reasoning version is planned for future release.

Ministral 3 Edge Models

The Ministral 3 series includes three parameter sizes: 3B, 8B, and 14B. Each size offers base, instruct, and reasoning variants with image understanding capabilities. Mistral claims the 14B reasoning variant achieves 85% on AIME 2025.

Pricing for Ministral 3 8B (pricing for other sizes not disclosed):

Input: $0.15 per 1M tokens
Output: $0.15 per 1M tokens

According to Mistral, the instruct models generate "an order of magnitude fewer tokens" than comparable models while matching or exceeding performance.

Technical Implementation

Mistral partnered with NVIDIA, vLLM, and Red Hat for deployment optimization. The company released a checkpoint in NVFP4 format using llm-compressor, enabling Mistral Large 3 to run on a single 8×A100 or 8×H100 node via vLLM. NVIDIA integrated Blackwell attention and MoE kernels for efficient inference on GB200 NVL72 systems.

For edge deployment, NVIDIA delivers optimized deployments on DGX Spark, RTX PCs, and Jetson devices.

Availability

All Mistral 3 models are available today on Mistral AI Studio, Amazon Bedrock, Azure Foundry, Hugging Face, Modal, IBM WatsonX, OpenRouter, Fireworks, Unsloth AI, and Together AI. NVIDIA NIM and AWS SageMaker availability is coming soon.

What This Means

Mistral's Apache 2.0 licensing decision for a 675B-parameter model represents the largest permissively-licensed model release to date, potentially accelerating enterprise adoption of open-weight alternatives to proprietary models. The sparse MoE architecture with 41B active parameters positions Large 3 as computationally efficient compared to dense models of similar capability, though real-world cost-effectiveness will depend on actual serving infrastructure requirements and the efficiency gains from the optimized NVFP4 format.

Source: mistral.ai ↗

mistral-ai model-release open-source mixture-of-experts multimodal edge-ai apache-2.0 nvidia

model releaseJuly 29, 2026

Unsloth Releases GGUF Quantizations of Kimi K3, a 2.8T-Parameter Open-Weight MoE Model

Unsloth has released GGUF quantizations of Kimi K3, a 2.8-trillion-parameter open-weight Mixture-of-Experts model from Moonshot AI with a 1-million-token context window and native vision support. The largest lossless quantization (Q8) weighs in at 1.56TB.

model releaseJuly 28, 2026

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Microsoft Research released VibeVoice-ASR-BitNet, a quantized 1.58GB version of its VibeVoice-ASR speech recognition model that achieves real-time inference (RTF < 1) on as few as 3 CPU threads. The model runs 1.6-2.3x faster than Whisper.cpp on commodity x86 and ARM hardware, with a modest accuracy tradeoff.

model releaseJuly 31, 2026

Thinking Machines Releases Inkling Small, a 12B-Active-Parameter Model That Beats Its Larger Predecessor on Key Benchmar

Thinking Machines has released Inkling Small, an open-weights reasoning model with 276 billion total parameters but only 12 billion active. According to Artificial Analysis, it scores nearly as high as the company's larger Inkling model while using roughly a third of the parameters and far fewer output tokens per task.

model releaseJuly 31, 2026

Thinking Machines Lab Releases Inkling Small: 276B MoE Model with 524K Context Window

Thinking Machines Lab has released Inkling Small, an open-weight multimodal mixture-of-experts model with 12B active parameters out of 276B total and a 524K token context window. The model targets reasoning, coding, agentic workflows, and multilingual use cases at $0.58 per 1M input tokens and $1.44 per 1M output tokens.

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral Large 3 — Quick Specs

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral Large 3 Specifications

Ministral 3 Edge Models

Technical Implementation

Availability

What This Means

Related Articles

Unsloth Releases GGUF Quantizations of Kimi K3, a 2.8T-Parameter Open-Weight MoE Model

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Thinking Machines Releases Inkling Small, a 12B-Active-Parameter Model That Beats Its Larger Predecessor on Key Benchmar

Thinking Machines Lab Releases Inkling Small: 276B MoE Model with 524K Context Window

Comments