model releaseMistral AI

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

TL;DR

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.

2 min read
0

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that handles instruction-following, reasoning, and coding in unified weights.

Model Specifications

  • Parameters: 128B (dense architecture)
  • Context window: 256k tokens
  • Modality: Multimodal input (text and images), text output
  • License: Modified MIT (open-source with revenue restrictions)

The model replaces Mistral Medium 3.1 and Magistral in Le Chat, and replaces Devstral 2 in Mistral's coding agent Vibe.

Key Technical Features

Mistral Medium 3.5 introduces configurable reasoning effort, allowing the same model to switch between fast responses and complex reasoning tasks. According to Mistral AI, users can set reasoning_effort="none" for quick replies or reasoning_effort="high" for complex agentic tasks.

The company trained the vision encoder from scratch to handle variable image sizes and aspect ratios. The model supports native function calling and JSON output for agentic applications.

Benchmark Performance

According to Mistral AI:

  • τ³-Telecom: 91.4%
  • SWE-Bench Verified: 77.6%

Mistral claims the model supersedes all previous Mistral coding models including Devstral across all benchmarks. The company states it achieves "strong results" on instruction following, reasoning (math), and coding benchmarks, though specific scores were not disclosed for all tests.

Language and Capabilities

The model supports dozens of languages including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, and Arabic. Mistral emphasizes "best-in-class agentic capabilities" and strong adherence to system prompts.

Deployment and Availability

Mistral Medium 3.5 is available through:

  • Mistral AI API
  • vLLM (recommended for production)
  • SGLang
  • llama.cpp (text only via Unsloth's GGUF)
  • Ollama
  • Transformers library

For faster local inference, Mistral released an accompanying EAGLE model for use with vLLM or SGLang. Fine-tuning is supported via Axolotl, Unsloth, and vLLM.

Recommended settings: temperature 0.7 for reasoning_effort="high" and 0.0-0.7 for reasoning_effort="none" depending on task.

What This Means

Mistral's unified architecture approach differs from competitors who maintain separate specialized models. The configurable reasoning represents an implementation of test-time compute scaling, allowing users to trade latency for performance on-demand. At 128B parameters, this sits between mid-tier and frontier models, targeting users who need strong performance without the cost of 400B+ parameter models. The Modified MIT license with revenue restrictions makes this effectively open-weight rather than fully open-source.

Related Articles

model release

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention to achieve 9× prefill and 15× decode speedups compared to its predecessor M2.

model release

Anthropic releases Fable 5, bringing capabilities of restricted Mythos model to public with $10/$50 per 1M token pricing

Anthropic has released Fable 5, making capabilities from its previously restricted Mythos model available to the public. The company claims Fable 5 beats GPT-5.5, Gemini 3.1 Pro, and its own Opus 4.8 in internal testing, with pricing set at $10 per million input tokens and $50 per million output tokens after a free trial period ending June 22.

model release

Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage

Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.

model release

Apple releases AFM 3 lineup: 20B-parameter on-device model and cloud AI running on Google's Nvidia infrastructure

Apple announced five third-generation foundation models at WWDC26, headlined by AFM 3 Core Advanced—a 20-billion-parameter sparse model that runs on-device by activating only 1-4 billion parameters at a time. For the first time, Apple extended Private Cloud Compute to third-party infrastructure, with AFM 3 Cloud Pro running on Nvidia GPUs in Google Cloud.

Comments

Loading...