Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters
Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts model with 21 billion active parameters and a 256K context window. The model scores 76.28% on MATH and 34.86% on LiveCodeBench-v6, with particularly strong performance on coding agent tasks.
Tencent Releases Hy3-Preview: 295B-Parameter MoE Model with 21B Active Parameters
Tencent has released Hy3-preview, a 295-billion-parameter Mixture-of-Experts (MoE) model with 21 billion active parameters. The model features a 256K context window and includes 3.8 billion parameters in its MTP (Multi-Token Prediction) layer. Both base and instruct versions are now available on Hugging Face, ModelScope, and GitCode under the Tencent Hy Community License.
Architecture and Specifications
Hy3-preview uses a sparse MoE architecture with 192 experts and top-8 activation, meaning 8 experts are activated per token. The model has 80 standard transformer layers plus 1 MTP layer, with 64 attention heads using grouped query attention (8 KV heads). It requires approximately 8 H20 GPUs or equivalent hardware for inference.
The model supports BF16 precision and has a vocabulary size of 120,832 tokens. According to Tencent, this is "the first model trained on our rebuilt infrastructure, and the strongest we've shipped so far."
Benchmark Performance
On base model benchmarks, Hy3-preview scores:
- MATH: 76.28% (4-shot)
- GSM8K: 95.37% (4-shot)
- MMLU: 87.42% (5-shot)
- MMLU-Pro: 65.76% (5-shot)
- LiveCodeBench-v6: 34.86% (1-shot)
- CRUXEval-I: 71.19% (3-shot)
- C-Eval: 89.80% (5-shot)
- MMMLU: 80.15% (5-shot)
The company claims particularly strong results on coding agent benchmarks including SWE-bench Verified and Terminal-Bench 2.0, as well as search agent benchmarks like BrowseComp and WideSearch. Tencent also reports "excellent results" on the Tsinghua Qiuzhen College Math PhD qualifying exam (Spring 2026) and China High School Biology Olympiad 2025, though specific scores were not disclosed.
Context Learning and Reasoning Modes
Tencent built two proprietary benchmarks—CL-bench and CL-bench-Life—to measure context learning ability in real business scenarios. The model supports three reasoning modes via the reasoning_effort parameter: "no_think" for direct responses, "low" for moderate reasoning, and "high" for deep chain-of-thought processing.
Recommended inference parameters are temperature=0.9 and top_p=1.0. The model can be deployed using vLLM or SGLang with MTP-enabled speculative decoding.
Training and Quantization
The release includes a complete training pipeline supporting both full fine-tuning and LoRA, with DeepSpeed ZeRO configurations and LLaMA-Factory integration. Tencent provides AngelSlim, a compression toolkit supporting quantization algorithms, low-bit quantization, and speculative sampling.
Pricing information has not been disclosed. The model is available for commercial use under Tencent's community license.
What This Means
Hy3-preview represents Tencent's entry into the 200B+ parameter MoE space, competing directly with models like DeepSeek-V3 (671B total, 37B active) and Kimi-K2 (1043B total, 32B active). With 21B active parameters versus competitors' 32-37B, Hy3-preview achieves competitive performance while potentially reducing inference costs. The strong LiveCodeBench score (34.86% vs DeepSeek-V3's 29.31%) and emphasis on agent capabilities suggests Tencent is positioning this model for practical coding and agentic applications rather than pure benchmark optimization. The 256K context window and custom context-learning benchmarks indicate a focus on real-world enterprise use cases.
Related Articles
Tencent Releases Hy3 Preview MoE Model with 262K Context and Three Reasoning Modes
Tencent has released Hy3 Preview, a Mixture-of-Experts model offering 262,144 token context window and three configurable reasoning modes (disabled, low, high) for production agentic workflows. The model is available for free through OpenRouter.
OpenAI GPT-5.5 Powers Codex Coding Agent on NVIDIA GB200 Infrastructure
OpenAI has released GPT-5.5, its latest frontier model, according to NVIDIA. The model powers Codex, OpenAI's agentic coding application, running on NVIDIA GB200 NVL72 rack-scale systems.
OpenAI releases GPT-5.5 with improved coding efficiency, fewer tokens needed
OpenAI has released GPT-5.5, one month after GPT-5.4, claiming improved performance on coding tasks and reduced token usage in its Codex platform. The model rolls out April 24 to paid ChatGPT tiers.
OpenAI releases GPT-5.5 with improved coding and computer control capabilities
OpenAI released GPT-5.5, its latest AI model with enhanced coding, computer operation, and research capabilities. The model is rolling out to paid subscribers in ChatGPT and Codex, with API access coming soon.
Comments
Loading...