model releaseArcee Ai

Arcee AI releases Trinity-Large-Thinking, open reasoning model matching Claude Opus on agent tasks

TL;DR

Arcee AI has released Trinity-Large-Thinking, a 400-billion-parameter open-weight reasoning model with a mixture-of-experts architecture that activates only 13 billion parameters per token. The model matches Claude Opus 4.6 on agent benchmarks like Tau2 and PinchBench but lags on general reasoning tasks. The company spent approximately $20 million—roughly half its total venture capital—to train the model on 2,048 Nvidia B300 GPUs over 33 days.

3 min read
0

Arcee AI Releases Trinity-Large-Thinking Open Reasoning Model

Arcee AI has released Trinity-Large-Thinking, a 400-billion-parameter open-weight reasoning model licensed under Apache 2.0 and designed specifically for agent tasks. The model competes directly with Anthropic's Claude Opus 4.6 on specialized benchmarks while maintaining inference efficiency through a mixture-of-experts architecture that activates only 13 billion parameters per token.

Training Investment and Infrastructure

The project consumed approximately $20 million in capital—roughly half of Arcee AI's total venture funding to date. Training ran on 2,048 Nvidia B300 GPUs for 33 consecutive days, processing 17 trillion tokens total. The company partnered with Prime Intellect for GPU cluster provision and DatologyAI for data curation.

The training run remained stable throughout without loss spikes, a notable achievement given the model's scale. The team credited a custom load-balancing method called SMEBU (Soft-clamped Momentum Expert Bias Updates) for preventing expert collapse—a problem that plagued early training runs when individual experts in the 256-expert network stopped receiving tokens.

Architecture and Capabilities

Trinity-Large-Thinking uses 256 specialized sub-networks with only 4 active per token, reducing computational overhead while preserving parameter capacity. The model generates explicit reasoning in special "think blocks" before each answer, optimized for tool calling, multi-stage planning, and autonomous workflows.

The architecture combines local attention layers (covering text sections) with global layers (spanning entire context) to support a 512K token context window without proportional compute increases. On the Needle-in-a-Haystack benchmark at 512K tokens, it achieved 0.976 accuracy, though it trained at 256K tokens.

Benchmark Performance: Strength in Agents, Weakness in General Reasoning

On agent-specific benchmarks, Trinity-Large-Thinking performs competitively:

  • Tau2-Airline: 88 (first place)
  • PinchBench: 91.9 (second place, vs. Claude Opus 4.6's 93.3)
  • AIME25: 96.3

General reasoning benchmarks reveal significant gaps:

  • GPQA-Diamond: 76.3 (vs. Claude Opus 4.6's 89.2)
  • MMLU-Pro: 83.4 (vs. Claude Opus 4.6's 89.1)

The base model reportedly matches GLM 4.5 performance despite activating substantially fewer parameters per token.

Training Data and Synthetic Contribution

Approximately 8 trillion of the 17 trillion training tokens were synthetically generated—among the largest documented uses of synthetic data for pretraining. This includes 6.5 trillion tokens of rewritten web text, ~1 trillion multilingual tokens, and ~800 billion code tokens.

A novel data processing method called Random Sequential Document Buffer (RSDB) randomizes document order rather than processing consecutive documents sequentially, reducing distribution drift between training steps.

Current Limitations and Future Plans

Arcee AI describes the current version as preliminary. The fine-tuning phase focused on tool use and multi-step reasoning ran shorter than planned due to GPU cluster availability constraints. The company plans more extensive post-training for future iterations.

A preview version released earlier on OpenRouter processed 3.37 trillion tokens in its first two months and ranked among the most-used open models in the US on that platform. The reasoning version is now live on OpenRouter and integrates with agent frameworks including OpenClaw and Hermes Agent.

Market Context

Arcee AI positions Trinity-Large-Thinking as the most powerful open model outside China, addressing dominance by Chinese labs like Qwen, MiniMax, and Zhipu AI in the open-weight space. The release arrives shortly after Google's Gemma 4 announcement, another open family using mixture-of-experts architecture under Apache 2.0 licensing.

What This Means

Trinity-Large-Thinking demonstrates that Western open-source AI development can match proprietary models in narrow domains (agent tasks) while accepting broader weaknesses. The $20 million commitment signals serious infrastructure investment required for competitive open models. However, the gap in general reasoning (76.3 vs. 89.2 on GPQA-D) indicates specialized optimization comes with trade-offs. For agent-specific applications with tool use and planning, this model provides a viable open alternative; for general-purpose reasoning, Claude Opus and similar models remain superior.

Related Articles

model release

Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context

Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.

model release

Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU

Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.

model release

Tencent Releases Hy-MT2 Translation Models: 1.8B, 7B, and 30B-A3B Support 33 Languages

Tencent released Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B (MoE) sizes. All models support translation among 33 languages and follow translation instructions in multiple languages. The 1.8B model can be compressed to 440MB using 1.25-bit AngelSlim quantization.

model release

Tencent Releases Hy-MT2: 1.8B Translation Model Compressed to 440MB With 1.25-Bit Quantization

Tencent has open-sourced Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B parameter sizes. The models support translation across 33 languages and include extreme quantization down to 1.25-bit, reducing the 1.8B model to 440MB storage while increasing inference speed by 1.5x.

Comments

Loading...