Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

TL;DR

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.

April 29, 2026 · 7:21 PM2 min read

Mistral Medium 3.5 — Quick Specs

Context window256K tokens

Compare Mistral Medium 3.5 with other models →

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that handles instruction-following, reasoning, and coding in unified weights.

Model Specifications

Parameters: 128B (dense architecture)
Context window: 256k tokens
Modality: Multimodal input (text and images), text output
License: Modified MIT (open-source with revenue restrictions)

The model replaces Mistral Medium 3.1 and Magistral in Le Chat, and replaces Devstral 2 in Mistral's coding agent Vibe.

Key Technical Features

Mistral Medium 3.5 introduces configurable reasoning effort, allowing the same model to switch between fast responses and complex reasoning tasks. According to Mistral AI, users can set reasoning_effort="none" for quick replies or reasoning_effort="high" for complex agentic tasks.

The company trained the vision encoder from scratch to handle variable image sizes and aspect ratios. The model supports native function calling and JSON output for agentic applications.

Benchmark Performance

According to Mistral AI:

τ³-Telecom: 91.4%
SWE-Bench Verified: 77.6%

Mistral claims the model supersedes all previous Mistral coding models including Devstral across all benchmarks. The company states it achieves "strong results" on instruction following, reasoning (math), and coding benchmarks, though specific scores were not disclosed for all tests.

Language and Capabilities

The model supports dozens of languages including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic. Mistral emphasizes "best-in-class agentic capabilities" and strong adherence to system prompts.

Deployment and Availability

Mistral Medium 3.5 is available through:

Mistral AI API
vLLM (recommended for production)
SGLang
llama.cpp (text only via Unsloth's GGUF)
Ollama
Transformers library

For faster local inference, Mistral released an accompanying EAGLE model for use with vLLM or SGLang. Fine-tuning is supported via Axolotl, Unsloth, and vLLM.

Recommended settings: temperature 0.7 for reasoning_effort="high" and 0.0-0.7 for reasoning_effort="none" depending on task.

What This Means

Mistral's unified architecture approach differs from competitors who maintain separate specialized models. The configurable reasoning represents an implementation of test-time compute scaling, allowing users to trade latency for performance on-demand. At 128B parameters, this sits between mid-tier and frontier models, targeting users who need strong performance without the cost of 400B+ parameter models. The Modified MIT license with revenue restrictions makes this effectively open-weight rather than fully open-source.

Source: huggingface.co ↗

mistral model-release multimodal reasoning coding open-weights 128b-parameters

model releaseApril 27, 2026

Alibaba's Qwen Team Releases Qwen3.6 27B With 262K Context Window and Video Processing

Alibaba's Qwen Team has released Qwen3.6 27B, a 27-billion parameter multimodal language model with a 262,144-token context window. The model accepts text, image, and video inputs and includes a built-in thinking mode for extended reasoning, with pricing at $0.195 per million input tokens and $1.56 per million output tokens.

model releaseApril 29, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

model releaseApril 29, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.

model releaseApril 28, 2026

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Mistral Medium 3.5 — Quick Specs

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Model Specifications

Key Technical Features

Benchmark Performance

Language and Capabilities

Deployment and Availability

What This Means

Related Articles

Alibaba's Qwen Team Releases Qwen3.6 27B With 262K Context Window and Video Processing

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Comments