model releaseNVIDIA

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

TL;DR

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.

June 5, 2026 · 2:20 PM2 min read

Nemotron 3 Ultra — Quick Specs

Context window1000K tokens

Input$0.5/1M tokens

Output$2.5/1M tokens

Compare Nemotron 3 Ultra with other models →

Nvidia Releases Nemotron 3 Ultra: 550B-Parameter MoE Model for Agentic AI

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts (MoE) model with 55 billion active parameters, designed for long-running agentic workflows and complex reasoning tasks.

Technical Specifications

The model features a hybrid Transformer-Mamba mixture-of-experts architecture and supports context windows of up to 1 million tokens. According to Nvidia, it handles text input and output only, positioning it as a frontier reasoning and orchestration model rather than a multimodal system.

Pricing is set at $0.50 per million input tokens and $2.50 per million output tokens through OpenRouter, which routes requests to providers handling the model.

Target Use Cases

Nvidia designed Nemotron 3 Ultra specifically for:

Agent orchestration across multiple AI systems
Coding agents requiring extended context
Deep research tasks with lengthy documents
Complex enterprise workflows
Multi-step reasoning and planning

The company claims the model is "particularly strong at multi-step reasoning and planning" with "high-throughput inference designed for high-volume agent pipelines."

Architecture and Performance

The hybrid Transformer-Mamba architecture represents a departure from pure transformer designs. The MoE approach activates only 55 billion of the total 550 billion parameters per inference call, reducing computational requirements while maintaining model capacity.

The 1-million token context window puts Nemotron 3 Ultra in the extended-context tier alongside models like Anthropic's Claude 3.5 Sonnet (200K) and Google's Gemini 1.5 Pro (2M tokens), though specific benchmark scores have not been disclosed.

Availability

Nemotron 3 Ultra is part of Nvidia's Nemotron family of open models for agentic AI. Model weights are available as a standard release, though Nvidia has not specified licensing terms or direct download locations beyond OpenRouter routing.

What This Means

Nemotron 3 Ultra targets a specific niche: long-context agentic workflows where extended reasoning and orchestration matter more than raw speed. The hybrid Transformer-Mamba architecture and MoE design suggest Nvidia is prioritizing inference efficiency for sustained agent operations over single-query performance. However, without published benchmark scores or independent testing, its actual capabilities relative to established models remain unverified. The pricing sits in the mid-range for frontier models, making it cost-competitive for high-volume agent deployments if performance claims hold.

Source: openrouter.ai ↗

Nvidia Nemotron MoE mixture-of-experts agentic AI 1M context Transformer-Mamba agent orchestration

model releaseJuly 20, 2026

Meituan launches LongCat 2.0: 1.6T parameter MoE model with 1M+ context window at $0.30 per 1M input tokens

Meituan has released LongCat 2.0, a sparse mixture-of-experts language model with 48 billion active parameters out of 1.6 trillion total. The model features a 1,049,000 token context window and costs $0.30 per 1M input tokens and $1.20 per 1M output tokens.

model releaseJuly 20, 2026

NVIDIA Releases Cosmos 3 Edge: 4B-Parameter World Model for Real-Time Robot Control at 15 Hz

NVIDIA has released Cosmos 3 Edge, a 4-billion-parameter open world model designed for edge AI systems. The model delivers real-time robot control at 15 Hz on NVIDIA Jetson devices, generating 32 actions per inference at 640×360 resolution.

model releaseJuly 20, 2026

Moonshot AI's Kimi K3 ranks #2 globally, will release 2.8T parameter weights July 27

Moonshot AI released Kimi K3 on July 16, 2026, a 2.8 trillion parameter mixture-of-experts model that ranks #2 on the Vals AI index and #3 on Artificial Analysis's Intelligence Index. The company will release the model's weights on July 27, making it the strongest open-weight model to date, surpassing all previous open releases including DeepSeek R1.

model releaseJuly 20, 2026

NVIDIA Releases Nemotron-3-Embed-1B-BF16: 1.14B Parameter Multilingual Embedding Model with 2048-Dimensional Vectors

NVIDIA has released Nemotron-3-Embed-1B-BF16, a 1.14 billion parameter text embedding model supporting 34 languages with a 32,768 token context window. The model generates 2048-dimensional embeddings and was derived from Ministral-3-3B-Instruct-2512 through two rounds of structured pruning and distillation, first to 2B then to 1.14B parameters.

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

Nemotron 3 Ultra — Quick Specs

Nvidia Releases Nemotron 3 Ultra: 550B-Parameter MoE Model for Agentic AI

Technical Specifications

Target Use Cases

Architecture and Performance

Availability

What This Means

Related Articles

Meituan launches LongCat 2.0: 1.6T parameter MoE model with 1M+ context window at $0.30 per 1M input tokens

NVIDIA Releases Cosmos 3 Edge: 4B-Parameter World Model for Real-Time Robot Control at 15 Hz

Moonshot AI's Kimi K3 ranks #2 globally, will release 2.8T parameter weights July 27

NVIDIA Releases Nemotron-3-Embed-1B-BF16: 1.14B Parameter Multilingual Embedding Model with 2048-Dimensional Vectors

Comments