model releaseArcee Ai

Arcee AI releases Trinity-Large-Thinking: 398B sparse MoE model with chain-of-thought reasoning

TL;DR

Arcee AI released Trinity-Large-Thinking, a 398B-parameter sparse Mixture-of-Experts model with approximately 13B active parameters per token, post-trained with extended chain-of-thought reasoning for agentic workflows. The model achieves 94.7% on τ²-Bench, 91.9% on PinchBench, and 98.2% on LiveCodeBench, generating explicit reasoning traces in <think>...</think> blocks before producing responses.

April 8, 2026 · 7:20 AM3 min read

Trinity Large Thinking — Quick Specs

Context window262K tokens

Compare Trinity Large Thinking with other models →

Arcee AI Releases Trinity-Large-Thinking: 398B Sparse MoE Model for Agent Reasoning

Arcee AI released Trinity-Large-Thinking, a 398B-parameter sparse Mixture-of-Experts model with approximately 13B active parameters per token. The model is post-trained with extended chain-of-thought reasoning and agentic reinforcement learning, designed specifically for tool calling, multi-step planning, and agent workflows.

Architecture and Specifications

Trinity-Large-Thinking uses a sparse MoE architecture with 256 total experts (1 shared), of which 4 are active per token, resulting in 1.56% sparsity. The model was trained on 17 trillion tokens using 2,048 NVIDIA B300 GPUs with HSDP and Expert Parallelism, with data partnership from Datology and compute partnership from Prime Intellect.

The pretraining context length is 8,192 tokens, extended to 512k after context length extension. Post-training included instruction tuning and agentic RL with extended chain-of-thought, trained on tool-calling trajectories and multi-step reasoning chains.

Reasoning and Tool Calling

Trinity-Large-Thinking generates explicit reasoning traces wrapped in <think>...</think> blocks before producing final responses. When served via vLLM, reasoning is exposed in a dedicated reasoning_content field in the API response. For multi-turn agentic loops, the full thinking blocks must be preserved in conversation history for subsequent turns—stripping thinking tokens breaks the model's prior reasoning chain.

vLLM deployments require --enable-reasoning --reasoning-parser deepseek_r1 and --enable-auto-tool-choice --tool-call-parser qwen3_coder flags to fully expose reasoning content and structured tool calls in OpenAI-compatible format.

Benchmark Performance

Trinity-Large-Thinking reports the following benchmark scores:

τ²-Airline: 88.0% (vs. Opus 4.6: 82.0%)
τ²-Telecom: 94.7% (vs. Opus 4.6: 92.1%)
PinchBench: 91.9% (vs. Opus 4.6: 93.3%)
LiveCodeBench: 98.2%
GPQA-Diamond: 76.3% (vs. Opus 4.6: 89.2%)
AIME25: 96.3% (vs. Opus 4.6: 99.8%)
MMLU-Pro: 83.4% (vs. Opus 4.6: 89.1%)
SWE-bench Verified: 63.2% (evaluated in mini-swe-agent-v2; vs. Opus 4.6: 75.6%)
BCFLv4: 70.1% (vs. Opus 4.6: 77.0%)

The model shows strong performance on agentic benchmarks (τ²-Telecom, PinchBench, LiveCodeBench) but trails Claude Opus on general reasoning benchmarks (GPQA-Diamond, AIME25).

Availability and Deployment

Trinity-Large-Thinking is available via:

OpenRouter API: No setup required; full reasoning and tool-calling support
vLLM: Recommended for agentic deployments (supported in vLLM 0.11.1+)
Hugging Face: Direct model download with trust_remote_code=True
Chat interface: chat.arcee.ai

The model works as a drop-in replacement for OpenClaw and Hermes Agent frameworks, with native tool-calling format compatible with agent execution loops.

Model Family

Trinity-Large-Thinking is one of four checkpoints in the Trinity-Large family:

Trinity-Large-Thinking: Reasoning-optimized with agentic post-training (this release)
Trinity-Large-Preview: Lightly post-trained, chat-ready instruct model without reasoning content
Trinity-Large-TrueBase: 10T-token pre-anneal pretraining checkpoint
Trinity-Large-Base: Full 17T-token pretrained foundation model with mid-training anneals

What This Means

Arcee AI's Trinity-Large-Thinking represents a specialized approach to reasoning models, prioritizing agentic performance over general-purpose capabilities. The model excels on task-oriented benchmarks (94.7% τ²-Telecom, 91.9% PinchBench) but underperforms on knowledge-heavy benchmarks compared to Claude Opus, suggesting it trades breadth for depth in agent-specific reasoning. The 512k context window and explicit reasoning traces make it technically suited for long-running agent loops, though real-world performance depends on proper context management—requiring users to preserve thinking tokens across multi-turn conversations. Availability on OpenRouter removes deployment friction for developers building agentic systems.

Source: huggingface.co ↗

arcee-ai model-release mixture-of-experts moe reasoning agent chain-of-thought tool-calling

model releaseJuly 6, 2026

Tencent Releases Hy3: 295B MoE Model with 256K Context and Configurable Reasoning Modes

Tencent has released Hy3, a 295-billion parameter Mixture-of-Experts model with 21 billion active parameters and a 256,000-token context window. The model features configurable reasoning modes and is available free through OpenRouter, with deployment ending July 21, 2026.

model releaseJuly 6, 2026

Tencent Releases Hy3: 295B-Parameter MoE Model with 21B Active Parameters at 256K Context

Tencent has released Hy3, a 295-billion parameter Mixture-of-Experts model with 21 billion active parameters and 3.8 billion MTP layer parameters. The model features a 256K context window and is released under Apache 2.0 license, with pricing not yet disclosed.

model releaseJuly 6, 2026

Nex AGI releases Nex-N2-Mini: open-source agentic MoE model with 262K context window

Nex AGI has released Nex-N2-Mini, an open-source agentic mixture-of-experts model with a 262K-token context window. The model accepts text and image inputs and is priced at $0.025 per 1M input tokens and $0.10 per 1M output tokens.

model releaseJuly 4, 2026

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.

Arcee AI releases Trinity-Large-Thinking: 398B sparse MoE model with chain-of-thought reasoning

Trinity Large Thinking — Quick Specs

Arcee AI Releases Trinity-Large-Thinking: 398B Sparse MoE Model for Agent Reasoning

Architecture and Specifications

Reasoning and Tool Calling

Benchmark Performance

Availability and Deployment

Model Family

What This Means

Related Articles

Tencent Releases Hy3: 295B MoE Model with 256K Context and Configurable Reasoning Modes

Tencent Releases Hy3: 295B-Parameter MoE Model with 21B Active Parameters at 256K Context

Nex AGI releases Nex-N2-Mini: open-source agentic MoE model with 262K context window

Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance

Comments