SWE-Bench

7 articles tagged with SWE-Bench

June 9, 2026

model releaseCohere

Cohere Releases North Mini Code 1.0: 30B-Parameter MoE Model With 256K Context for Agentic Coding

Cohere Labs has released North Mini Code 1.0, a 30B-parameter sparse Mixture-of-Experts model with 3B active parameters and a 256K context window. The Apache 2.0-licensed model is optimized for agentic software engineering, featuring 128 experts with 8 activated per token, and trained specifically for tool use in coding tasks.

June 9, 2026 · 5:21 PM

May 28, 2026

model releaseMistral AI

Mistral Releases Medium 3.5: 128B Model with Cloud Coding Agents and 77.6% SWE-Bench Verified

Mistral AI released Medium 3.5, a 128B dense model with a 256k context window that scores 77.6% on SWE-Bench Verified. The model powers new remote coding agents in Mistral Vibe that run asynchronously in the cloud, plus a new Work mode in Le Chat for multi-step agentic tasks.

May 28, 2026 · 10:09 AM

model releaseMistral AI

Mistral releases Devstral Medium and Small 1.1 with 61.6% SWE-Bench Verified score

Mistral AI has released two specialized coding models: Devstral Medium, achieving 61.6% on SWE-Bench Verified, and Devstral Small 1.1, scoring 53.6% and released under Apache 2.0 license. The company claims Devstral Medium surpasses Gemini 2.5 Pro and GPT-4.1 at a quarter of the price.

May 28, 2026 · 9:51 AM

April 29, 2026

model releaseOpenAI

OpenAI releases GPT-5.5 with 82.7% Terminal-Bench score, API priced at $5/$30 per million tokens

OpenAI released GPT-5.5 on April 23, its first retrained base model since GPT-4.5, scoring 82.7% on Terminal-Bench 2.0 versus GPT-5.4's 75.1% and Claude Opus 4.7's 69.4%. API pricing is set at $5 per million input tokens and $30 per million output tokens, exactly double GPT-5.4 rates.

April 29, 2026 · 9:21 AM

April 20, 2026

model releaseMoonshot AI+1

Moonshot AI Releases Kimi K2.6: 1T-Parameter MoE Model with 256K Context and Agent Swarm Capabilities

Moonshot AI has released Kimi K2.6, an open-source multimodal model with 1 trillion total parameters (32B activated) and 256K context window. The model achieves 80.2% on SWE-Bench Verified, 58.6% on SWE-Bench Pro, and supports horizontal scaling to 300 sub-agents executing 4,000 coordinated steps.

April 20, 2026 · 4:06 PM

April 9, 2026

model releaseZhipu AI

GLM-5.1 released: 754B agentic model outperforms Claude on coding benchmarks

Zhipu AI released GLM-5.1, a 754-parameter model optimized for agentic engineering tasks. The model scores 58.4% on SWE-Bench Pro, outperforming Claude 3.5 Sonnet (57.3%), and demonstrates sustained reasoning capability over hundreds of iterations.

April 9, 2026 · 6:50 PM

April 7, 2026

model release

GLM-5.1 achieves 58.4% on SWE-Bench Pro with sustained agentic reasoning over hundreds of iterations

Zhipu AI has released GLM-5.1, a 754-billion parameter model designed for agentic engineering with significantly improved coding capabilities over its predecessor. The model achieves 58.4% on SWE-Bench Pro and demonstrates sustained performance improvement over hundreds of tool calls and iterations, unlike earlier models that plateau quickly.

April 7, 2026 · 5:51 PM

← Back to all news