model releaseTencent

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

TL;DR

Tencent has released Hy3 preview, a Mixture-of-Experts model with a 262,144 token context window priced at $0.066 per million input tokens and $0.26 per million output tokens. The model features three configurable reasoning modes—disabled, low, and high—designed for agentic workflows and production environments.

May 8, 2026 · 11:05 PM2 min read

Hy3 Preview — Quick Specs

Context window262K tokens

Input$0.066/1M tokens

Output$0.26/1M tokens

Compare Hy3 Preview with other models →

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Tencent has released Hy3 preview, a Mixture-of-Experts (MoE) model with a 262,144 token context window, priced at $0.066 per million input tokens and $0.26 per million output tokens.

Key Specifications

The model supports three configurable reasoning levels: disabled, low, and high modes. According to Tencent, this allows users to balance computational speed against reasoning depth depending on task requirements.

Hy3 preview is designed specifically for agentic workflows and production environments. Tencent claims the model delivers strong code generation capabilities and reliable performance across multi-step, real-world workflows.

Reasoning Architecture

The model exposes its reasoning process through a reasoning_details array in API responses. When enabled, the model shows step-by-step thinking before producing final answers. To maintain reasoning continuity across conversation turns, developers must preserve the complete reasoning_details when passing messages back to the model.

The reasoning feature is controlled via a reasoning parameter in API requests, allowing developers to toggle between the three modes based on task complexity.

Availability

Hy3 preview is available through OpenRouter's API, which normalizes requests and responses across multiple model providers. The model was released on April 22, 2026, according to the OpenRouter model registry.

Model weights are available, though distribution details were not specified in the release information.

Pricing Context

At $0.066 per million input tokens, Hy3 preview positions itself in the lower cost tier for frontier models. The 3.9x multiplier between input and output pricing ($0.26 per million output tokens) is standard for models with generation-focused workloads.

What This Means

Tencent's entry with a configurable reasoning model signals continued competition in the agent-focused AI space. The three-tier reasoning system is a practical approach to the speed-versus-accuracy tradeoff that developers face when building production systems. The 262K context window places it in the extended-context category, though still below the 1M+ context leaders. The combination of MoE architecture, configurable reasoning, and competitive pricing makes this a relevant option for developers building multi-step agentic applications who need cost-effective inference with reasoning capabilities.

Source: openrouter.ai ↗

tencent mixture-of-experts reasoning-models agentic-ai extended-context model-release

model releaseMay 8, 2026

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

Allen Institute for AI released EMO, a 1B-active, 14B-total-parameter mixture-of-experts model trained on 1 trillion tokens. The model uses 8 active experts per token from a pool of 128 total experts, and can maintain near full-model performance while using just 12.5% of its experts for specific tasks.

model releaseMay 7, 2026

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

model releaseMay 6, 2026

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Google DeepMind has released a Multi-Token Prediction assistant model for Gemma 4 26B A4B that achieves up to 2x decoding speedup through speculative decoding. The model uses 3.8B active parameters from a 25.2B total parameter MoE architecture with 128 experts and a 256K token context window.

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Hy3 Preview — Quick Specs

Tencent Releases Hy3 Preview: Mixture-of-Experts Model with 262K Context and Configurable Reasoning

Key Specifications

Reasoning Architecture

Availability

Pricing Context

What This Means

Related Articles

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Google DeepMind Releases Gemma 4 26B A4B Assistant Model for 2x Faster Inference via Multi-Token Prediction

Comments