model releaseZhipu AI

GLM-5.1 released: 754B agentic model outperforms Claude on coding benchmarks

TL;DR

Zhipu AI released GLM-5.1, a 754-parameter model optimized for agentic engineering tasks. The model scores 58.4% on SWE-Bench Pro, outperforming Claude 3.5 Sonnet (57.3%), and demonstrates sustained reasoning capability over hundreds of iterations.

2 min read
0

GLM-5.1: 754B Agentic Model Outperforms Claude on Coding Benchmarks

Zhipu AI released GLM-5.1, a 754-parameter flagship model built for agentic engineering tasks. The model achieves 58.4% on SWE-Bench Pro—the primary metric for software engineering capability—exceeding Claude 3.5 Sonnet (57.3%) and maintaining substantial leads on specialized benchmarks.

Key Performance Metrics

GLM-5.1 achieves state-of-the-art performance across multiple agentic benchmarks:

  • SWE-Bench Pro: 58.4% (vs. Claude 57.3%, Gemini 3.1 Pro 54.2%)
  • NL2Repo (repo generation): 42.7% (vs. Claude 49.8%, significant improvement over GLM-5's 35.9%)
  • Terminal-Bench 2.0: 63.5% on Terminus-2 suite
  • CyberGym: 68.7% (vs. Claude 66.6%)
  • BrowseComp with context management: 79.3% (vs. Gemini 84.0%, Claude 75.9%)

Mathematical reasoning shows mixed performance: 95.3% on AIME 2026 and 86.2% on GPQA-Diamond, trailing GPT-5.4 (98.7% on AIME) and Gemini 3.1 Pro (94.3% on GPQA).

Distinctive Agentic Capability

Unlike previous models including GLM-5, which plateau after initial optimizations, GLM-5.1 is designed to sustain effectiveness over extended problem-solving horizons. According to the developers, the model handles ambiguous problems with improved judgment and maintains productivity across longer sessions—breaking complex tasks into experiments, reading results, identifying blockers, and revising strategies through hundreds of iterations and thousands of tool calls.

This iterative reasoning approach distinguishes it from models optimized for single-pass performance.

Deployment and Quantization

Unsloth released GGUF quantized versions with 17 quant options ranging from 206 GB (1-bit UD-IQ1_M) to 1.51 TB (16-bit BF16). The releases implement Unsloth Dynamic 2.0 quantization, which the developers claim achieves superior accuracy compared to other quantization methods.

Supported inference frameworks include:

  • SGLang (v0.5.10+)
  • vLLM (v0.19.0+)
  • xLLM (v0.8.0+)
  • Transformers (v4.5.3+)
  • KTransformers (v0.5.3+)

The model received 13,329 downloads on Hugging Face in its first month.

Availability

GLM-5.1 is available through Z.ai API Platform for inference. The developers announced chat.z.ai access would come in subsequent days. A technical report and GitHub repository were published alongside the release.

What This Means

GLM-5.1 represents a shift in agentic model design: instead of pursuing raw benchmark scores on isolated tasks, the focus is extended-horizon reasoning and iterative refinement. Its SWE-Bench Pro lead over Claude positions it as the strongest open-access model for software engineering tasks, though Gemini 3.1 Pro and GPT-5.4 maintain mathematical reasoning advantages. The quantized GGUF versions enable local deployment at scale, with memory requirements scaling from 206 GB to 1.51 TB depending on precision needs.

Related Articles

model release

Tencent Releases Hy-MT2 Translation Models: 1.8B, 7B, and 30B-A3B Support 33 Languages

Tencent released Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B (MoE) sizes. All models support translation among 33 languages and follow translation instructions in multiple languages. The 1.8B model can be compressed to 440MB using 1.25-bit AngelSlim quantization.

model release

Tencent Releases Hy-MT2: 1.8B Translation Model Compressed to 440MB With 1.25-Bit Quantization

Tencent has open-sourced Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B parameter sizes. The models support translation across 33 languages and include extreme quantization down to 1.25-bit, reducing the 1.8B model to 440MB storage while increasing inference speed by 1.5x.

model release

Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU

Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.

model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

Comments

Loading...