model release

Z.AI releases GLM-5.2 with 1M token context, outperforms GPT-5.5 on long-horizon coding benchmarks

TL;DR

Z.AI has released GLM-5.2, an open-source model with a 1M-token context window under an MIT license. On FrontierSWE, a long-horizon coding benchmark, GLM-5.2 trails Claude Opus 4.8 by 1% while outperforming GPT-5.5 by 1%, and achieves 81.0 on Terminal-Bench 2.1 compared to Opus 4.8's 85.0.

2 min read
0

Z.AI Releases GLM-5.2 with 1M Token Context, Outperforms GPT-5.5 on Long-Horizon Coding Benchmarks

Z.AI has released GLM-5.2, an open-source model with a 1M-token context window released under an MIT license with no regional restrictions. The model is designed specifically for long-horizon coding tasks and agent-based workflows.

Benchmark Performance

On FrontierSWE, which measures multi-hour technical projects including systems optimization and ML research, GLM-5.2 trails Claude Opus 4.8 by 1% while outperforming GPT-5.5 by 1% and Claude Opus 4.7 by 11%. According to Z.AI, the company ranks as the highest-performing open-source model across three long-horizon benchmarks.

On standard coding benchmarks, GLM-5.2 scores 81.0 on Terminal-Bench 2.1, approaching Claude Opus 4.8's 85.0 and surpassing Gemini 3.1 Pro. The model achieves 62.1 on SWE-bench Pro, compared to its predecessor GLM-5.1's 58.4.

On PostTrainBench, where agents receive an H100 GPU to improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8, according to Z.AI. On SWE-Marathon, covering compiler building and kernel optimization, GLM-5.2 trails Opus 4.8 by 13%.

Technical Architecture

GLM-5.2 introduces IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at 1M context length. The company claims this approach maintains quality across long coding-agent trajectories rather than simply accepting more tokens.

The model includes effort level control, allowing users to balance capability against execution speed. At comparable token budgets, Z.AI positions GLM-5.2's capability between Claude Opus 4.7 and 4.8, with a Max effort level for additional computation on challenging tasks.

The improved MTP (multi-token prediction) layer for speculative decoding increases acceptance length by up to 20% through techniques including IndexShare, KV cache reuse, rejection sampling, and end-to-end TV loss training. In ablation testing, acceptance length improved from 4.56 tokens to 5.47 tokens.

Training and Availability

Z.AI expanded 1M-context training specifically for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and debugging. The training incorporated IndexShare from mid-training at 128K sequence length.

The model is available under an MIT open-source license with no regional limits. Pricing has not been disclosed.

What This Means

GLM-5.2 represents the first truly competitive open-source alternative for long-context coding work, with performance within striking distance of frontier closed models. The 2.9× FLOP reduction at 1M tokens addresses a critical efficiency bottleneck that has limited practical deployment of ultra-long-context models. Z.AI's focus on sustained quality across messy agent trajectories—rather than just benchmark performance on clean inputs—suggests the model may handle real-world coding workflows better than context window size alone would indicate. The MIT license removes barriers that have limited enterprise adoption of other open models.

Related Articles

model release

GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License

Zhipu AI has released GLM-5.2, a 753 billion parameter model featuring a 1 million token context window and MIT open-source license. The model scores 62.1% on SWE-bench Pro and 91.2% on GPQA-Diamond, with flexible reasoning effort levels for coding tasks.

model release

Z.ai Releases GLM-5.2 with 1M Token Context Window at $1.40/$4.40 per Million

Z.ai has released GLM-5.2, a model designed for long-horizon engineering tasks with a 1 million token context window. The model is priced at $1.40 per million input tokens and $4.40 per million output tokens, and was released on June 16, 2025.

model release

Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%

Microsoft released FastContext-1.0, a lightweight repository-exploration subagent for LLM coding agents spanning 4B to 30B parameters. The model reduced main-agent token consumption by up to 60% while improving end-to-end resolution rates by up to 5.5% on SWE-bench Pro when integrated with agents like GPT-5.4 and GLM-5.1.

model release

MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup

MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention to achieve 9× prefill and 15× decode speedups compared to its predecessor M2.

Comments

Loading...