InclusionAI Releases Ring-2.6-1T: 1 Trillion Parameter Thinking Model with 63B Active Parameters

TL;DR

InclusionAI has released Ring-2.6-1T, a 1 trillion parameter-scale model with 63 billion active parameters and a 262,144-token context window. The model features adaptive reasoning modes and is designed for coding agents, tool use, and long-horizon task execution.

May 8, 2026 · 3:20 PM2 min read

Ring-2.6-1T — Quick Specs

Context window262K tokens

Compare Ring-2.6-1T with other models →

InclusionAI Releases Ring-2.6-1T: 1 Trillion Parameter Thinking Model with 63B Active Parameters

InclusionAI has released Ring-2.6-1T, a thinking model with 1 trillion parameters at scale but only 63 billion active parameters during inference. The model is now available on OpenRouter with a 262,144-token context window.

Architecture and Performance

Ring-2.6-1T uses a sparse architecture that activates 63B of its 1T total parameters, designed to balance capability with operational efficiency. According to InclusionAI, the model delivers leading results on PinchBench, ClawEval, TAU2-Bench, and GAIA2-search benchmarks, though specific scores were not disclosed.

The model features adaptive reasoning with "high" and "xhigh" modes that dynamically allocate reasoning budget based on task complexity. This approach aims to reduce token overhead in multi-turn agent workflows compared to fixed reasoning strategies.

Target Use Cases

InclusionAI positions Ring-2.6-1T for three primary applications:

Coding agents: Advanced code generation and debugging workflows
Tool use: Multi-step operations requiring external API calls and function execution
Long-horizon tasks: Complex autonomous systems that require planning across extended interactions

The model's 262K context window enables handling of large codebases and extended conversation histories without truncation.

Availability and Pricing

Ring-2.6-1T is available through OpenRouter's platform with a "free" tier, though specific pricing details for paid usage tiers were not provided. The model was released on May 8, 2026, making it one of the first major releases of that year.

No information about alternative API access, self-hosting options, or licensing terms has been disclosed.

What This Means

The sparse activation approach—using only 63B of 1T parameters—represents a continued industry trend toward mixture-of-experts and conditional compute architectures that reduce inference costs while maintaining model capacity. The 262K context window places Ring-2.6-1T among longer-context models, though it remains below the 1M+ token windows recently announced by some competitors. The focus on agent workflows and tool use suggests InclusionAI is targeting the growing market for autonomous AI systems rather than pure chat applications. However, without disclosed benchmark scores or third-party validation, actual performance relative to established models like GPT-4, Claude, or DeepSeek remains uncertain.

Source: openrouter.ai ↗

inclusionai ring-2-6-1t thinking-model coding-agents tool-use sparse-architecture moe long-context

model releaseMay 8, 2026

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

Allen Institute for AI released EMO, a 1B-active, 14B-total-parameter mixture-of-experts model trained on 1 trillion tokens. The model uses 8 active experts per token from a pool of 128 total experts, and can maintain near full-model performance while using just 12.5% of its experts for specific tasks.

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

model releaseMay 6, 2026

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

Google DeepMind has released Gemma 4, a family of open-weight multimodal models including a 31B dense model with 256K context window and four size variants ranging from 2.3B to 30.7B effective parameters. The release includes Multi-Token Prediction (MTP) draft models that achieve up to 2x decoding speedup through speculative decoding while maintaining identical output quality.

model releaseMay 2, 2026

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

InclusionAI Releases Ring-2.6-1T: 1 Trillion Parameter Thinking Model with 63B Active Parameters

Ring-2.6-1T — Quick Specs

InclusionAI Releases Ring-2.6-1T: 1 Trillion Parameter Thinking Model with 63B Active Parameters

Architecture and Performance

Target Use Cases

Availability and Pricing

What This Means

Related Articles

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Google DeepMind releases Gemma 4 with 31B dense model, 256K context window, and speculative decoding drafters

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

Comments