model releaseZhipu AI

Zhipu AI releases GLM-5.2 with 1M token context and 62.1% SWE-bench Pro score

TL;DR

Zhipu AI released GLM-5.2, a 753 billion parameter model with a 1 million token context window. The model scores 62.1% on SWE-bench Pro and introduces IndexShare architecture that reduces per-token FLOPs by 2.9× at 1M context length. Released under MIT license with no regional restrictions.

June 18, 2026 · 8:06 AM2 min read

GLM-5.2 — Quick Specs

Context window1000K tokens

Input$0.826/1M tokens

Output$2.596/1M tokens

Compare GLM-5.2 with other models →

Zhipu AI releases GLM-5.2 with 1M token context and 62.1% SWE-bench Pro score

Zhipu AI released GLM-5.2, a 753 billion parameter model with a 1 million token context window and enhanced coding capabilities. The model scores 62.1% on SWE-bench Pro, positioning it between GPT-5.5 (58.6%) and Claude Opus 4.8 (69.2%) according to the company's benchmarks.

Key specifications

GLM-5.2 achieves the following benchmark scores according to Zhipu AI:

SWE-bench Pro: 62.1%
NL2Repo: 48.9%
DeepSWE: 46.2%
ProgramBench: 63.7%
AIME 2026: 99.2%
GPQA-Diamond: 91.2%
HMMT November 2025: 94.4%

The model supports 1 million token context length and is available in FP8 quantization format. Parameter count is 753 billion. Pricing has not been disclosed.

Technical architecture

GLM-5.2 introduces IndexShare, which reuses the same indexer across every four sparse attention layers. According to Zhipu AI, this reduces per-token FLOPs by 2.9× at 1M context length compared to the previous architecture.

The model includes an improved MTP (Multi-Token Prediction) layer for speculative decoding, which the company claims increases acceptance length by up to 20%.

GLM-5.2 offers multiple "thinking effort levels" for coding tasks, allowing developers to balance between performance and latency depending on task complexity.

Deployment and availability

The model is released under MIT open-source license with no regional restrictions. It supports deployment through:

SGLang (v0.5.13.post1+)
vLLM (v0.23.0+)
Transformers (v0.5.12+)
KTransformers (v0.5.12+)

For Ascend NPU platforms, vLLM-Ascend, xLLM, and SGLang are supported.

API access is available through Z.ai API Platform, though pricing details have not been published.

Benchmark positioning

On coding benchmarks, GLM-5.2 shows improvements over its predecessor GLM-5.1 (58.4% on SWE-bench Pro) and claims to match or exceed Qwen3.7-Max (60.6%) and MiniMax M3 (59%). However, it trails Claude Opus 4.8 (69.2%) and DeepSeek-V4-Pro (55.4%) on several metrics.

On the HLE reasoning benchmark, GLM-5.2 scores 40.5, behind Claude Opus 4.8 (49.8) and Gemini 3.1 Pro (45), but ahead of DeepSeek-V4-Pro (37.7) and MiniMax M3 (37).

What this means

GLM-5.2 represents Zhipu AI's push into the 1M context space dominated by models like Gemini 2.5 Pro and Claude 3.7 Sonnet. The IndexShare architecture addresses a key challenge in long-context models: computational efficiency at extended lengths. The 2.9× reduction in FLOPs at 1M tokens could make the model more practical for deployment compared to architectures that scale linearly with context length. The MIT license removes barriers common in Chinese AI models, potentially increasing adoption in Western markets. However, without disclosed pricing and independent benchmark verification, actual competitiveness against Anthropic and OpenAI's offerings remains to be demonstrated in production environments.

Source: huggingface.co ↗

GLM-5.2 Zhipu AI long-context coding SWE-bench 1M-context IndexShare open-source

model releaseJuly 28, 2026

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Microsoft Research released VibeVoice-ASR-BitNet, a quantized 1.58GB version of its VibeVoice-ASR speech recognition model that achieves real-time inference (RTF < 1) on as few as 3 CPU threads. The model runs 1.6-2.3x faster than Whisper.cpp on commodity x86 and ARM hardware, with a modest accuracy tradeoff.

model releaseAugust 2, 2026

Anthropic's Claude Opus 5 Generates Full 3D Games From a Single Text Prompt, No Assets Required

Anthropic's Claude Opus 5 can generate playable 3D games, including first-person shooters and Minecraft clones, from a single text prompt with zero external assets. Community tests claim it outperforms GPT-5.6 Sol and Kimi K3 in physics realism and mechanical complexity, though no standardized benchmark has confirmed the comparisons.

model releaseAugust 1, 2026

ByteDance's Seedance 2.5 Generates 30-Second AI Video Clips With Synced Audio

ByteDance released Seedance 2.5, an AI video model that generates synchronized video and audio in a single pass, producing clips up to 30 seconds long that can be extended further. That's roughly triple the length of Google's Gemini Omni Flash.

model releaseAugust 1, 2026

OpenAI Reportedly Developing 'Astra' Model Family for Multi-Day Autonomous Problem-Solving

OpenAI is reportedly developing a new model family called Astra, designed to coordinate multiple agents on complex problems over hours or days. The models are already in testing and would be first to go through a planned U.S. government pre-release review, according to The Information.

Zhipu AI releases GLM-5.2 with 1M token context and 62.1% SWE-bench Pro score

GLM-5.2 — Quick Specs

Zhipu AI releases GLM-5.2 with 1M token context and 62.1% SWE-bench Pro score

Key specifications

Technical architecture

Deployment and availability

Benchmark positioning

What this means

Related Articles

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Anthropic's Claude Opus 5 Generates Full 3D Games From a Single Text Prompt, No Assets Required

ByteDance's Seedance 2.5 Generates 30-Second AI Video Clips With Synced Audio

OpenAI Reportedly Developing 'Astra' Model Family for Multi-Day Autonomous Problem-Solving

Comments