GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License
Zhipu AI has released GLM-5.2, a 753 billion parameter model featuring a 1 million token context window and MIT open-source license. The model scores 62.1% on SWE-bench Pro and 91.2% on GPQA-Diamond, with flexible reasoning effort levels for coding tasks.
GLM-5.2 Released with 1M Token Context and 753B Parameters Under MIT License
Zhipu AI has released GLM-5.2, a 753 billion parameter model that delivers a 1 million token context window under an MIT open-source license. The model represents an architectural shift with its IndexShare feature, which reduces per-token FLOPs by 2.9× at 1M context length by reusing the same indexer across every four sparse attention layers.
Benchmark Performance
According to Zhipu AI, GLM-5.2 achieves:
- SWE-bench Pro: 62.1% (versus 58.4% for GLM-5.1)
- GPQA-Diamond: 91.2% (versus 86.2% for GLM-5.1)
- NL2Repo: 48.9% (versus 42.7% for GLM-5.1)
- DeepSWE: 46.2% (versus 18% for GLM-5.1)
- AIME 2026: 99.2%
- HLE reasoning: 40.5% (54.7% with tools)
The company positions GLM-5.2 competitively against Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro on long-horizon task benchmarks, though these comparisons represent company claims rather than independent verification.
Technical Architecture
GLM-5.2 introduces two key architectural improvements:
- IndexShare: Reuses indexers across sparse attention layers, reducing computational requirements at extended context lengths
- Enhanced MTP layer: Improved for speculative decoding, increasing acceptance length by up to 20%
The model supports multiple "thinking effort levels" for coding tasks, allowing developers to balance performance against latency requirements.
Pricing and Availability
Pricing not yet disclosed. The model is available through:
- Z.ai API Platform for hosted inference
- Local deployment via SGLang (v0.5.13.post1+), vLLM (v0.23.0+), xLLM (v0.10.0+), Transformers (v0.5.12+), and KTransformers (v0.5.12+)
- Hugging Face model hub (zai-org/GLM-5.2)
What This Means
GLM-5.2's MIT license removes geographic restrictions common in other frontier models, potentially accelerating adoption in regions where licensing has been restrictive. The architectural focus on computational efficiency at extreme context lengths—2.9× reduction in FLOPs—addresses a critical bottleneck as context windows expand industry-wide. However, the benchmark scores show GLM-5.2 trailing Claude Opus 4.8 and GPT-5.5 on most coding benchmarks, with particular gaps on SWE-bench Pro (69.2% vs 62.1%) and NL2Repo (69.7% vs 48.9%), suggesting it competes more directly with open-weight alternatives than proprietary frontier models.
Related Articles
Z.ai Releases GLM-5.2 with 1M Token Context Window at $1.40/$4.40 per Million
Z.ai has released GLM-5.2, a model designed for long-horizon engineering tasks with a 1 million token context window. The model is priced at $1.40 per million input tokens and $4.40 per million output tokens, and was released on June 16, 2025.
MiniMax Releases M3: 428B-Parameter Multimodal Model with 1M Context Window and 15× Decode Speedup
MiniMax has released M3, a multimodal model with approximately 428 billion parameters and 23 billion activated parameters. The model supports a 1 million token context window and uses MiniMax Sparse Attention to achieve 9× prefill and 15× decode speedups compared to its predecessor M2.
Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%
Microsoft released FastContext-1.0, a lightweight repository-exploration subagent for LLM coding agents spanning 4B to 30B parameters. The model reduced main-agent token consumption by up to 60% while improving end-to-end resolution rates by up to 5.5% on SWE-bench Pro when integrated with agents like GPT-5.4 and GLM-5.1.
Amazon Bedrock adds Gemma 4 models with 256K context and built-in reasoning mode
Amazon Web Services today announced availability of Google DeepMind's Gemma 4 family on Amazon Bedrock. The open-weight models include three instruction-tuned variants spanning 2.3B to 30.7B parameters, with 256K context windows, multimodal input support, and built-in reasoning mode.
Comments
Loading...