model releaseArcee Ai

Arcee AI Releases Trinity Large Preview: 400B-Parameter MoE Model with 512K Context Window

TL;DR

Arcee AI has released Trinity Large Preview, a 400B-parameter sparse Mixture-of-Experts model with 13B active parameters per token using 4-of-256 expert routing. The model supports context windows up to 512K tokens and is available with open weights under permissive licensing.

April 22, 2026 · 4:36 PM2 min read

Trinity Large Preview — Quick Specs

Context window131K tokens

Compare Trinity Large Preview with other models →

Arcee AI Releases Trinity Large Preview: 400B-Parameter MoE Model with 512K Context Window

Arcee AI has released Trinity Large Preview, a 400B-parameter sparse Mixture-of-Experts (MoE) language model with 13B active parameters per token. The model uses 4-of-256 expert routing architecture and is currently available for free through OpenRouter.

Technical Specifications

Trinity Large Preview features:

Total parameters: 400 billion (sparse)
Active parameters per token: 13 billion
Expert routing: 4-of-256 MoE architecture
Context window: Up to 512K tokens (native support)
Current deployment: 128K context window using 8-bit quantization
Pricing: Free during preview period (ends April 22, 2026)
License: Open weights with permissive licensing

Capabilities and Target Use Cases

According to Arcee AI, Trinity Large Preview excels in creative writing, storytelling, role-play, chat scenarios, and real-time voice assistance. The company claims the model performs better in these areas than typical reasoning models.

The model was specifically trained for agentic workflows, designed to navigate agent frameworks including OpenCode, Cline, and Kilo Code. Arcee AI states it handles complex toolchains and long, constraint-filled prompts effectively.

Deployment Details

The Preview API currently serves the model at 128K context using 8-bit quantization for practical deployment, though the architecture natively supports context windows up to 512K tokens. The model is available through OpenRouter's API with standard OpenAI-compatible formatting.

Benchmark scores have not been disclosed. The free preview period will end on April 22, 2026, though future pricing has not been announced.

What This Means

Trinity Large Preview represents Arcee AI's entry into frontier-scale models with a focus on efficiency through sparse MoE architecture. The 400B total parameter count with only 13B active per token aims to deliver large model capabilities at lower computational cost. The open weights and permissive licensing lower barriers for developers and researchers to experiment with frontier-scale models, particularly for agentic applications. However, the lack of published benchmark scores makes it difficult to assess performance against competing models in the 400B+ parameter class.

Source: openrouter.ai ↗

Arcee AI Trinity Large Preview Mixture of Experts MoE open weights agentic AI long context model release

model releaseJune 5, 2026

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.

model releaseJune 4, 2026

NVIDIA Nemotron 3 Ultra launches on AWS SageMaker with 550B parameters, 1M token context window

NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart with 550 billion total parameters and 55 billion active parameters. The model features a hybrid Transformer-Mamba Mixture-of-Experts architecture and supports context windows up to 1 million tokens, targeting agentic AI workloads.

model releaseJune 5, 2026

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.

model releaseJune 4, 2026

Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window

Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts model with 55B active parameters and a 1M token context window. The model uses a hybrid Transformer-Mamba architecture and is available for free through OpenRouter, targeting agentic workflows and multi-step reasoning tasks.

Arcee AI Releases Trinity Large Preview: 400B-Parameter MoE Model with 512K Context Window

Trinity Large Preview — Quick Specs

Arcee AI Releases Trinity Large Preview: 400B-Parameter MoE Model with 512K Context Window

Technical Specifications

Capabilities and Target Use Cases

Deployment Details

What This Means

Related Articles

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

NVIDIA Nemotron 3 Ultra launches on AWS SageMaker with 550B parameters, 1M token context window

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window

Comments