model releaseZhipu AI

GLM-5.1 released: 754B agentic model outperforms Claude on coding benchmarks

TL;DR

Zhipu AI released GLM-5.1, a 754-parameter model optimized for agentic engineering tasks. The model scores 58.4% on SWE-Bench Pro, outperforming Claude 3.5 Sonnet (57.3%), and demonstrates sustained reasoning capability over hundreds of iterations.

April 9, 2026 · 6:50 PM2 min read

GLM-5.1 — Quick Specs

Compare GLM-5.1 with other models →

GLM-5.1: 754B Agentic Model Outperforms Claude on Coding Benchmarks

Zhipu AI released GLM-5.1, a 754-parameter flagship model built for agentic engineering tasks. The model achieves 58.4% on SWE-Bench Pro—the primary metric for software engineering capability—exceeding Claude 3.5 Sonnet (57.3%) and maintaining substantial leads on specialized benchmarks.

Key Performance Metrics

GLM-5.1 achieves state-of-the-art performance across multiple agentic benchmarks:

SWE-Bench Pro: 58.4% (vs. Claude 57.3%, Gemini 3.1 Pro 54.2%)
NL2Repo (repo generation): 42.7% (vs. Claude 49.8%, significant improvement over GLM-5's 35.9%)
Terminal-Bench 2.0: 63.5% on Terminus-2 suite
CyberGym: 68.7% (vs. Claude 66.6%)
BrowseComp with context management: 79.3% (vs. Gemini 84.0%, Claude 75.9%)

Mathematical reasoning shows mixed performance: 95.3% on AIME 2026 and 86.2% on GPQA-Diamond, trailing GPT-5.4 (98.7% on AIME) and Gemini 3.1 Pro (94.3% on GPQA).

Distinctive Agentic Capability

Unlike previous models including GLM-5, which plateau after initial optimizations, GLM-5.1 is designed to sustain effectiveness over extended problem-solving horizons. According to the developers, the model handles ambiguous problems with improved judgment and maintains productivity across longer sessions—breaking complex tasks into experiments, reading results, identifying blockers, and revising strategies through hundreds of iterations and thousands of tool calls.

This iterative reasoning approach distinguishes it from models optimized for single-pass performance.

Deployment and Quantization

Unsloth released GGUF quantized versions with 17 quant options ranging from 206 GB (1-bit UD-IQ1_M) to 1.51 TB (16-bit BF16). The releases implement Unsloth Dynamic 2.0 quantization, which the developers claim achieves superior accuracy compared to other quantization methods.

Supported inference frameworks include:

SGLang (v0.5.10+)
vLLM (v0.19.0+)
xLLM (v0.8.0+)
Transformers (v4.5.3+)
KTransformers (v0.5.3+)

The model received 13,329 downloads on Hugging Face in its first month.

Availability

GLM-5.1 is available through Z.ai API Platform for inference. The developers announced chat.z.ai access would come in subsequent days. A technical report and GitHub repository were published alongside the release.

What This Means

GLM-5.1 represents a shift in agentic model design: instead of pursuing raw benchmark scores on isolated tasks, the focus is extended-horizon reasoning and iterative refinement. Its SWE-Bench Pro lead over Claude positions it as the strongest open-access model for software engineering tasks, though Gemini 3.1 Pro and GPT-5.4 maintain mathematical reasoning advantages. The quantized GGUF versions enable local deployment at scale, with memory requirements scaling from 206 GB to 1.51 TB depending on precision needs.

Source: huggingface.co ↗

GLM-5.1 agentic-ai coding-models SWE-Bench quantization GGUF zhipu-ai model-release

model releaseApril 9, 2026

Zhipu AI's GLM-5.1 outperforms GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro through iterative strategy refinement

Zhipu AI has released GLM-5.1, a freely available open-weight model designed for long-running programming tasks that achieves 58.4% on SWE-Bench Pro, edging out GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%). The model's core capability is iterative strategy refinement—it rethinks its approach across hundreds of iterations and thousands of tool calls, recognizing dead ends and shifting tactics without human intervention. However, GLM-5.1 trails on reasoning and knowledge benchmarks, scoring 31% on Humanity's Last Exam compared to Gemini 3.1 Pro's 45%.

model releaseApril 8, 2026

Alibaba's Qwen3.6 Plus reaches 78.8 on SWE-bench with 1M context window

Alibaba released Qwen3.6 Plus on April 2, 2026, featuring a 1 million token context window at $0.50 per million input tokens and $3 per million output tokens. The model combines linear attention with sparse mixture-of-experts routing to achieve a 78.8 score on SWE-bench Verified, with significant improvements in agentic coding, front-end development, and reasoning tasks.

model releaseApril 7, 2026

Z.ai releases GLM-5.1, 754B parameter open-weight model with improved code generation

Z.ai has released GLM-5.1, a 754-billion parameter open-weight model matching the size of its predecessor GLM-5. The model demonstrates improved ability to generate complex, multi-part outputs like HTML pages with SVG graphics and CSS animations, available via Hugging Face and OpenRouter.