model release

Z.AI releases GLM-5.2 with 1M token context, outperforms GPT-5.5 on long-horizon coding benchmarks

TL;DR

Z.AI has released GLM-5.2, an open-source model with a 1M-token context window under an MIT license. On FrontierSWE, a long-horizon coding benchmark, GLM-5.2 trails Claude Opus 4.8 by 1% while outperforming GPT-5.5 by 1%, and achieves 81.0 on Terminal-Bench 2.1 compared to Opus 4.8's 85.0.

June 17, 2026 · 9:21 AM2 min read

GLM-5.2 — Quick Specs

Context window1000K tokens

Input$0.826/1M tokens

Output$2.596/1M tokens

Compare GLM-5.2 with other models →

Z.AI Releases GLM-5.2 with 1M Token Context, Outperforms GPT-5.5 on Long-Horizon Coding Benchmarks

Z.AI has released GLM-5.2, an open-source model with a 1M-token context window released under an MIT license with no regional restrictions. The model is designed specifically for long-horizon coding tasks and agent-based workflows.

Benchmark Performance

On FrontierSWE, which measures multi-hour technical projects including systems optimization and ML research, GLM-5.2 trails Claude Opus 4.8 by 1% while outperforming GPT-5.5 by 1% and Claude Opus 4.7 by 11%. According to Z.AI, the company ranks as the highest-performing open-source model across three long-horizon benchmarks.

On standard coding benchmarks, GLM-5.2 scores 81.0 on Terminal-Bench 2.1, approaching Claude Opus 4.8's 85.0 and surpassing Gemini 3.1 Pro. The model achieves 62.1 on SWE-bench Pro, compared to its predecessor GLM-5.1's 58.4.

On PostTrainBench, where agents receive an H100 GPU to improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8, according to Z.AI. On SWE-Marathon, covering compiler building and kernel optimization, GLM-5.2 trails Opus 4.8 by 13%.

Technical Architecture

GLM-5.2 introduces IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at 1M context length. The company claims this approach maintains quality across long coding-agent trajectories rather than simply accepting more tokens.

The model includes effort level control, allowing users to balance capability against execution speed. At comparable token budgets, Z.AI positions GLM-5.2's capability between Claude Opus 4.7 and 4.8, with a Max effort level for additional computation on challenging tasks.

The improved MTP (multi-token prediction) layer for speculative decoding increases acceptance length by up to 20% through techniques including IndexShare, KV cache reuse, rejection sampling, and end-to-end TV loss training. In ablation testing, acceptance length improved from 4.56 tokens to 5.47 tokens.

Training and Availability

Z.AI expanded 1M-context training specifically for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and debugging. The training incorporated IndexShare from mid-training at 128K sequence length.

The model is available under an MIT open-source license with no regional limits. Pricing has not been disclosed.

What This Means

GLM-5.2 represents the first truly competitive open-source alternative for long-context coding work, with performance within striking distance of frontier closed models. The 2.9× FLOP reduction at 1M tokens addresses a critical efficiency bottleneck that has limited practical deployment of ultra-long-context models. Z.AI's focus on sustained quality across messy agent trajectories—rather than just benchmark performance on clean inputs—suggests the model may handle real-world coding workflows better than context window size alone would indicate. The MIT license removes barriers that have limited enterprise adoption of other open models.

Source: huggingface.co ↗

glm-52 z-ai open-source long-context coding 1m-tokens benchmark claude-opus

model releaseJuly 29, 2026

OpenAI's GPT Transcribe Cuts Word Error Rate to 3.31% but Trails ElevenLabs, Google, and Mistral

OpenAI released GPT Transcribe and GPT Live Transcribe, improving word error rate to 3.31 percent and cutting prices 25 percent to $0.0045 per minute. Independent benchmarks still place OpenAI behind ElevenLabs, Google, and Mistral on transcription accuracy.

model releaseJuly 28, 2026

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Microsoft Research released VibeVoice-ASR-BitNet, a quantized 1.58GB version of its VibeVoice-ASR speech recognition model that achieves real-time inference (RTF < 1) on as few as 3 CPU threads. The model runs 1.6-2.3x faster than Whisper.cpp on commodity x86 and ARM hardware, with a modest accuracy tradeoff.

model releaseAugust 1, 2026

OpenAI Reportedly Developing 'Astra' Model Family for Multi-Day Autonomous Problem-Solving

OpenAI is reportedly developing a new model family called Astra, designed to coordinate multiple agents on complex problems over hours or days. The models are already in testing and would be first to go through a planned U.S. government pre-release review, according to The Information.

model releaseJuly 31, 2026

Google DeepMind Launches Gemini Robotics 2, a Single VLA Model for Arms to Humanoids

Google DeepMind has introduced Gemini Robotics 2, a vision-language-action model it calls its most advanced yet, designed to control everything from tabletop robot arms to full-body humanoids. The company also released Gemini Robotics ER 2, an embodied reasoning model that replaces ER 1.6.