model release

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

TL;DR

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

2 min read
0

Zyphra ZAYA1-8B Achieves Frontier-Level Math Performance with 760M Active Parameters

Zyphra has released ZAYA1-8B, a mixture-of-experts (MoE) language model with 760M active parameters and 8.4B total parameters that achieves competitive performance with models over 10x its size on mathematical reasoning benchmarks.

Benchmark Performance

ZYPHA1-8B scores 89.1% on AIME 2026, outperforming Qwen3-4B-Thinking-2507 (77.5%) and Gemma-4-E4B-it (50.3%). According to Zyphra, the model matches or exceeds performance of significantly larger reasoning models:

  • AIME 2026: 89.1% (vs. 90.2% for Qwen3-Next-80B-A3B-Think with 80B total parameters)
  • HMMT February 2026: 71.6% (vs. 79.3% for Qwen3-Next-80B)
  • LiveCodeBench v6: 63.8% (comparable to larger models)
  • GPQA-Diamond: 71.0%
  • MMLU-Pro: 74.2%
  • IFEval: 85.8%

The model also scores 59.3% on IMO-AnswerBench and 32.2% on APEX-shortlist, significantly ahead of same-class models.

Architecture and Efficiency

ZYPHA1-8B uses a mixture-of-experts architecture with only 760M parameters active during inference while maintaining 8.4B total parameters. This design enables on-device deployment despite its competitive performance with frontier models like Mistral-Small-4-119B (6B active, 119B total) and Intellect-3 (12B active, 106B total).

The model requires specific installation from Zyphra's forked versions of vLLM and Transformers libraries. Deployment requires the --mamba-cache-dtype float32 --dtype bfloat16 flags and uses a custom reasoning parser.

Technical Specifications

  • Active parameters: 760M
  • Total parameters: 8.4B
  • Model type: Mixture of experts with reasoning capabilities
  • Inference format: Requires vLLM server with custom flags
  • Recommended dtype: bfloat16 with float32 mamba cache

Availability

The post-trained reasoning version is available on Hugging Face. Zyphra has also released the pretraining base model separately. Pricing information has not been disclosed.

What This Means

ZYPHA1-8B demonstrates that mixture-of-experts architectures can achieve frontier-level mathematical reasoning with a fraction of the active parameters typically required. The 760M active parameter count makes it viable for edge deployment scenarios where models like Qwen3-Next-80B (3B active, 80B total) would be impractical. However, the model's relative weakness on creative writing tasks (62.97% on Creative Writing v3 vs. 83.75% for Gemma-4-E4B) and agentic benchmarks (39.22% on BFCL-v4) suggests the efficiency gains come with tradeoffs in general capability. The requirement for custom library forks may limit immediate adoption.

Related Articles

model release

Amazon Bedrock adds Gemma 4 models with 256K context and built-in reasoning mode

Amazon Web Services today announced availability of Google DeepMind's Gemma 4 family on Amazon Bedrock. The open-weight models include three instruction-tuned variants spanning 2.3B to 30.7B parameters, with 256K context windows, multimodal input support, and built-in reasoning mode.

model release

Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage

Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.

model release

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

model release

Zhipu AI releases GLM-5.2 with 1M token context and 62.1% SWE-bench Pro score

Zhipu AI released GLM-5.2, a 753 billion parameter model with a 1 million token context window. The model scores 62.1% on SWE-bench Pro and introduces IndexShare architecture that reduces per-token FLOPs by 2.9× at 1M context length. Released under MIT license with no regional restrictions.

Comments

Loading...