Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks
Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.
Zyphra ZAYA1-8B Achieves Frontier-Level Math Performance with 760M Active Parameters
Zyphra has released ZAYA1-8B, a mixture-of-experts (MoE) language model with 760M active parameters and 8.4B total parameters that achieves competitive performance with models over 10x its size on mathematical reasoning benchmarks.
Benchmark Performance
ZYPHA1-8B scores 89.1% on AIME 2026, outperforming Qwen3-4B-Thinking-2507 (77.5%) and Gemma-4-E4B-it (50.3%). According to Zyphra, the model matches or exceeds performance of significantly larger reasoning models:
- AIME 2026: 89.1% (vs. 90.2% for Qwen3-Next-80B-A3B-Think with 80B total parameters)
- HMMT February 2026: 71.6% (vs. 79.3% for Qwen3-Next-80B)
- LiveCodeBench v6: 63.8% (comparable to larger models)
- GPQA-Diamond: 71.0%
- MMLU-Pro: 74.2%
- IFEval: 85.8%
The model also scores 59.3% on IMO-AnswerBench and 32.2% on APEX-shortlist, significantly ahead of same-class models.
Architecture and Efficiency
ZYPHA1-8B uses a mixture-of-experts architecture with only 760M parameters active during inference while maintaining 8.4B total parameters. This design enables on-device deployment despite its competitive performance with frontier models like Mistral-Small-4-119B (6B active, 119B total) and Intellect-3 (12B active, 106B total).
The model requires specific installation from Zyphra's forked versions of vLLM and Transformers libraries. Deployment requires the --mamba-cache-dtype float32 --dtype bfloat16 flags and uses a custom reasoning parser.
Technical Specifications
- Active parameters: 760M
- Total parameters: 8.4B
- Model type: Mixture of experts with reasoning capabilities
- Inference format: Requires vLLM server with custom flags
- Recommended dtype: bfloat16 with float32 mamba cache
Availability
The post-trained reasoning version is available on Hugging Face. Zyphra has also released the pretraining base model separately. Pricing information has not been disclosed.
What This Means
ZYPHA1-8B demonstrates that mixture-of-experts architectures can achieve frontier-level mathematical reasoning with a fraction of the active parameters typically required. The 760M active parameter count makes it viable for edge deployment scenarios where models like Qwen3-Next-80B (3B active, 80B total) would be impractical. However, the model's relative weakness on creative writing tasks (62.97% on Creative Writing v3 vs. 83.75% for Gemma-4-E4B) and agentic benchmarks (39.22% on BFCL-v4) suggests the efficiency gains come with tradeoffs in general capability. The requirement for custom library forks may limit immediate adoption.
Related Articles
Amazon Bedrock adds Gemma 4 models with 256K context and built-in reasoning mode
Amazon Web Services today announced availability of Google DeepMind's Gemma 4 family on Amazon Bedrock. The open-weight models include three instruction-tuned variants spanning 2.3B to 30.7B parameters, with 256K context windows, multimodal input support, and built-in reasoning mode.
Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage
Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.
Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0
Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.
Zhipu AI releases GLM-5.2 with 1M token context and 62.1% SWE-bench Pro score
Zhipu AI released GLM-5.2, a 753 billion parameter model with a 1 million token context window. The model scores 62.1% on SWE-bench Pro and introduces IndexShare architecture that reduces per-token FLOPs by 2.9× at 1M context length. Released under MIT license with no regional restrictions.
Comments
Loading...