Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters
Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters and a 256K context window. The model scores 76.28% on MATH and 34.86% on LiveCodeBench-v6, with particularly strong performance on coding agent tasks.
Hy3 Preview — Quick Specs
Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters
Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts (MoE) model with 21 billion active parameters. The model features a 256K context window and includes 3.8 billion parameters in its MTP (Multi-Token Prediction) layer. Both base and instruct versions are now available on Hugging Face, ModelScope, and GitCode under the Tencent Hy Community License.
Architecture and Specifications
Hy3-preview uses a sparse MoE architecture with 192 experts and top-8 activation, meaning 8 experts are activated per token. The model has 80 standard transformer layers plus 1 MTP layer, with 64 attention heads using grouped query attention (8 KV heads). It requires approximately 8 H20 GPUs or equivalent hardware for inference.
The model supports BF16 precision and has a vocabulary size of 120,832 tokens. According to Tencent, this is "the first model trained on our rebuilt infrastructure, and the strongest we've shipped so far."
Benchmark Performance
On base model benchmarks, Hy3-preview scores:
- MATH: 76.28% (4-shot)
- GSM8K: 95.37% (4-shot)
- MMLU: 87.42% (5-shot)
- MMLU-Pro: 65.76% (5-shot)
- LiveCodeBench-v6: 34.86% (1-shot)
- CRUXEval-I: 71.19% (3-shot)
- C-Eval: 89.80% (5-shot)
- MMMLU: 80.15% (5-shot)
The company claims particularly strong results on coding agent benchmarks including SWE-bench Verified and Terminal-Bench 2.0, as well as search agent benchmarks like BrowseComp and WideSearch. Tencent also reports "excellent results" on the Tsinghua Qiuzhen College Math PhD qualifying exam (Spring 2026) and China High School Biology Olympiad 2025, though specific scores were not disclosed.
Context Learning and Reasoning Modes
Tencent built two proprietary benchmarks—CL-bench and CL-bench-Life—to measure context learning ability in real business scenarios. The model supports three reasoning modes via the reasoning_effort parameter: "no_think" for direct responses, "low" for moderate reasoning, and "high" for deep chain-of-thought processing.
Recommended inference parameters are temperature=0.9 and top_p=1.0. The model can be deployed using vLLM or SGLang with MTP-enabled speculative decoding.
Training and Quantization
The release includes a complete training pipeline supporting both full fine-tuning and LoRA, with DeepSpeed ZeRO configurations and LLaMA-Factory integration. Tencent provides AngelSlim, a compression toolkit supporting quantization algorithms, low-bit quantization, and speculative sampling.
Pricing information has not been disclosed. The model is available for commercial use under Tencent's community license.
What This Means
Hy3-preview represents Tencent's entry into the 200B+ parameter MoE space, competing directly with models like DeepSeek-V3 (671B total, 37B active) and Kimi-K2 (1043B total, 32B active). With 21B active parameters versus competitors' 32-37B, Hy3-preview achieves competitive performance while potentially reducing inference costs. The strong LiveCodeBench score (34.86% vs DeepSeek-V3's 29.31%) and emphasis on agent capabilities suggests Tencent is positioning this model for practical coding and agentic applications rather than pure benchmark optimization. The 256K context window and custom context-learning benchmarks indicate a focus on real-world enterprise use cases.
Related Articles
Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows
Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.
Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window
Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts model with 55B active parameters and a 1M token context window. The model uses a hybrid Transformer-Mamba architecture and is available for free through OpenRouter, targeting agentic workflows and multi-step reasoning tasks.
NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning
NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.
NVIDIA Nemotron 3 Ultra launches on AWS SageMaker with 550B parameters, 1M token context window
NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart with 550 billion total parameters and 55 billion active parameters. The model features a hybrid Transformer-Mamba Mixture-of-Experts architecture and supports context windows up to 1 million tokens, targeting agentic AI workloads.
Comments
Loading...