Mistral releases Devstral Medium and Small 1.1 with 61.6% SWE-Bench Verified score

TL;DR

Mistral AI has released two specialized coding models: Devstral Medium, achieving 61.6% on SWE-Bench Verified, and Devstral Small 1.1, scoring 53.6% and released under Apache 2.0 license. The company claims Devstral Medium surpasses Gemini 2.5 Pro and GPT-4.1 at a quarter of the price.

May 28, 2026 · 9:51 AM2 min read

Devstral Medium — Quick Specs

Input$0.4/1M tokens

Output$2/1M tokens

Compare Devstral Medium with other models →

Mistral releases Devstral Medium and Small 1.1 with 61.6% SWE-Bench Verified score

Mistral AI has released two specialized coding models developed in collaboration with All Hands AI: Devstral Medium and Devstral Small 1.1. The models are designed specifically for agentic coding tasks, with emphasis on generalization across different prompts and agentic scaffolds.

Devstral Medium: API-only proprietary model

Devstral Medium achieves 61.6% on SWE-Bench Verified, according to Mistral AI. The company claims the model surpasses Gemini 2.5 Pro and GPT-4.1 at a quarter of the price, though specific benchmark comparisons were not provided.

Pricing for Devstral Medium (devstral-medium-2507):

Input: $0.40 per 1M tokens
Output: $2.00 per 1M tokens

The model is available through Mistral's API and supports on-premise deployment for enterprise customers. Custom fine-tuning is available for enterprises requiring task-specific optimization.

Devstral Small 1.1: Open-source Apache 2.0 release

Devstral Small 1.1 scores 53.6% on SWE-Bench Verified. Mistral claims this sets a new state-of-the-art for open models without test-time scaling, though the model maintains the same 24B parameter architecture as its predecessor.

Pricing for Devstral Small 1.1 (devstral-small-2507):

Input: $0.10 per 1M tokens
Output: $0.30 per 1M tokens

Key improvements over the previous version:

Enhanced performance on SWE-Bench Verified (previous score not disclosed)
Better generalization to different coding environments
Support for both Mistral function calling and XML formats
Optimized for use with OpenHands agentic framework

Technical specifications

Both models support:

Multiple agentic scaffolds and prompting formats
Integration with coding environments
Function calling capabilities

Devstral Small 1.1 is released under the Apache 2.0 license, allowing unrestricted commercial and research use. The model is available for local deployment. Devstral Medium remains proprietary but can be deployed on private infrastructure through enterprise agreements.

Context window size, training data cutoff, and detailed architecture specifications were not disclosed.

What this means

Mistral is positioning itself in the increasingly competitive coding model space with a two-tier strategy: an open-source model for local deployment and experimentation, and a proprietary API model targeting enterprise customers. The 61.6% SWE-Bench Verified score for Devstral Medium, if independently verified, would be competitive with leading coding models, though claims of cost advantage over Gemini 2.5 Pro and GPT-4.1 require context on benchmark parity. The Apache 2.0 release of the 24B parameter Small model provides the open-source community with a capable coding agent foundation without licensing restrictions.

Source: mistral.ai ↗

Mistral Devstral coding-models open-source Apache-2.0 SWE-Bench code-agents model-release

model releaseJuly 11, 2026

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

Cohere and Cohere Labs released Cohere Transcribe Arabic, a 2B parameter automatic speech recognition model optimized for Arabic dialects and Arabic-English code-switching. The open-source model achieves a 25.9% average word error rate across major Arabic ASR benchmarks, outperforming models up to 30B parameters.

model releaseJuly 9, 2026

OpenAI announces GPT-5.6 with three models (Sol, Terra, Luna) and ChatGPT Work agent tool

OpenAI released GPT-5.6 in three model tiers—Sol (flagship reasoning), Terra (mainstream), and Luna (instant)—positioning them against Anthropic's Claude models. The company claims GPT-5.6 Sol scores 53.6 on Agents' Last Exam, 13.1 points above Claude Fable 5, while completing tasks 61% faster. ChatGPT Work, a desktop productivity agent similar to Claude Cowork, launches simultaneously for Pro, Enterprise, and Edu users.

model releaseJuly 9, 2026

OpenAI releases GPT-5.6 family in three sizes: Luna at $1/$6, Terra at $2.50/$15, Sol at $5/$30 per 1M tokens

OpenAI released its GPT-5.6 flagship model family in three sizes: Luna ($1/$6 per 1M tokens), Terra ($2.50/$15), and Sol ($5/$30). The company claims GPT-5.6 Sol scores 53.6 on the Agents' Last Exam benchmark, outperforming Claude Fable 5's score by 13.1 points.

model releaseJuly 9, 2026

OpenAI Releases GPT-5.6 Luna Pro with Extended Reasoning Mode at $1/$6 Per Million Tokens

OpenAI has released GPT-5.6 Luna Pro, a reasoning-enhanced variant of GPT-5.6 Luna with a 1 million token context window. The model is priced at $1 per million input tokens and $6 per million output tokens, with a knowledge cutoff date of February 2026.

Mistral releases Devstral Medium and Small 1.1 with 61.6% SWE-Bench Verified score

Devstral Medium — Quick Specs

Mistral releases Devstral Medium and Small 1.1 with 61.6% SWE-Bench Verified score

Devstral Medium: API-only proprietary model

Devstral Small 1.1: Open-source Apache 2.0 release

Technical specifications

What this means

Related Articles

Cohere releases 2B parameter Arabic speech recognition model with 25.9% average WER

OpenAI announces GPT-5.6 with three models (Sol, Terra, Luna) and ChatGPT Work agent tool

OpenAI releases GPT-5.6 family in three sizes: Luna at $1/$6, Terra at $2.50/$15, Sol at $5/$30 per 1M tokens

OpenAI Releases GPT-5.6 Luna Pro with Extended Reasoning Mode at $1/$6 Per Million Tokens

Comments