model release

GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License

TL;DR

Zhipu AI has released GLM-5.2, a 753 billion parameter model featuring a 1 million token context window and MIT open-source license. The model scores 62.1% on SWE-bench Pro and 91.2% on GPQA-Diamond, with flexible reasoning effort levels for coding tasks.

June 16, 2026 · 6:21 PM2 min read

GLM-5.2 — Quick Specs

Context window1000K tokens

Input$0.826/1M tokens

Output$2.596/1M tokens

Compare GLM-5.2 with other models →

GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License

Zhipu AI has released GLM-5.2, a 753 billion parameter model that delivers a 1 million token context window under an MIT open-source license. The model represents an architectural shift with its IndexShare feature, which reduces per-token FLOPs by 2.9× at 1M context length by reusing the same indexer across every four sparse attention layers.

Benchmark Performance

According to Zhipu AI, GLM-5.2 achieves:

SWE-bench Pro: 62.1% (versus 58.4% for GLM-5.1)
GPQA-Diamond: 91.2% (versus 86.2% for GLM-5.1)
NL2Repo: 48.9% (versus 42.7% for GLM-5.1)
DeepSWE: 46.2% (versus 18% for GLM-5.1)
AIME 2026: 99.2%
HLE reasoning: 40.5% (54.7% with tools)

The company positions GLM-5.2 competitively against Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro on long-horizon task benchmarks, though these comparisons represent company claims rather than independent verification.

Technical Architecture

GLM-5.2 introduces two key architectural improvements:

IndexShare: Reuses indexers across sparse attention layers, reducing computational requirements at extended context lengths
Enhanced MTP layer: Improved for speculative decoding, increasing acceptance length by up to 20%

The model supports multiple "thinking effort levels" for coding tasks, allowing developers to balance performance against latency requirements.

Pricing and Availability

Pricing not yet disclosed. The model is available through:

Z.ai API Platform for hosted inference
Local deployment via SGLang (v0.5.13.post1+), vLLM (v0.23.0+), xLLM (v0.10.0+), Transformers (v0.5.12+), and KTransformers (v0.5.12+)
Hugging Face model hub (zai-org/GLM-5.2)

What This Means

GLM-5.2's MIT license removes geographic restrictions common in other frontier models, potentially accelerating adoption in regions where licensing has been restrictive. The architectural focus on computational efficiency at extreme context lengths—2.9× reduction in FLOPs—addresses a critical bottleneck as context windows expand industry-wide. However, the benchmark scores show GLM-5.2 trailing Claude Opus 4.8 and GPT-5.5 on most coding benchmarks, with particular gaps on SWE-bench Pro (69.2% vs 62.1%) and NL2Repo (69.7% vs 48.9%), suggesting it competes more directly with open-weight alternatives than proprietary frontier models.

Source: huggingface.co ↗

GLM-5.2 Zhipu AI long-context coding open-source MIT-license 753B-parameters 1M-context

model releaseJuly 28, 2026

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Microsoft Research released VibeVoice-ASR-BitNet, a quantized 1.58GB version of its VibeVoice-ASR speech recognition model that achieves real-time inference (RTF < 1) on as few as 3 CPU threads. The model runs 1.6-2.3x faster than Whisper.cpp on commodity x86 and ARM hardware, with a modest accuracy tradeoff.

model releaseJuly 31, 2026

Google DeepMind Launches Gemini Robotics 2, a Single VLA Model for Arms to Humanoids

Google DeepMind has introduced Gemini Robotics 2, a vision-language-action model it calls its most advanced yet, designed to control everything from tabletop robot arms to full-body humanoids. The company also released Gemini Robotics ER 2, an embodied reasoning model that replaces ER 1.6.

model releaseJuly 31, 2026

Thinking Machines Releases Inkling Small, a 12B-Active-Parameter Model That Beats Its Larger Predecessor on Key Benchmar

Thinking Machines has released Inkling Small, an open-weights reasoning model with 276 billion total parameters but only 12 billion active. According to Artificial Analysis, it scores nearly as high as the company's larger Inkling model while using roughly a third of the parameters and far fewer output tokens per task.

model releaseJuly 31, 2026

DeepSeek Releases V4-Flash-0731, a 284B-Parameter Model That Beats Its Own Larger Pro Variant on Agentic Benchmarks

DeepSeek has shipped the full release of DeepSeek-V4-Flash-0731, a 284B-parameter model that according to DeepSeek outperforms its own larger V4-Pro (Preview) on agentic and coding benchmarks. Unsloth has published quantized GGUF versions, with lossless 8-bit weights requiring 162GB of storage.

GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License

GLM-5.2 — Quick Specs

GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License

Benchmark Performance

Technical Architecture

Pricing and Availability

What This Means

Related Articles

Microsoft Releases VibeVoice-ASR-BitNet: 1.58GB Speech Recognition Model Runs Real-Time on CPU, No GPU Needed

Google DeepMind Launches Gemini Robotics 2, a Single VLA Model for Arms to Humanoids

Thinking Machines Releases Inkling Small, a 12B-Active-Parameter Model That Beats Its Larger Predecessor on Key Benchmar

DeepSeek Releases V4-Flash-0731, a 284B-Parameter Model That Beats Its Own Larger Pro Variant on Agentic Benchmarks

Comments