model releaseNVIDIA

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

TL;DR

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.

2 min read
0

Nemotron 3 Ultra — Quick Specs

Context window1000K tokens
Input$0.5/1M tokens
Output$2.5/1M tokens

Nvidia Releases Nemotron 3 Ultra: 550B-Parameter MoE Model for Agentic AI

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts (MoE) model with 55 billion active parameters, designed for long-running agentic workflows and complex reasoning tasks.

Technical Specifications

The model features a hybrid Transformer-Mamba mixture-of-experts architecture and supports context windows of up to 1 million tokens. According to Nvidia, it handles text input and output only, positioning it as a frontier reasoning and orchestration model rather than a multimodal system.

Pricing is set at $0.50 per million input tokens and $2.50 per million output tokens through OpenRouter, which routes requests to providers handling the model.

Target Use Cases

Nvidia designed Nemotron 3 Ultra specifically for:

  • Agent orchestration across multiple AI systems
  • Coding agents requiring extended context
  • Deep research tasks with lengthy documents
  • Complex enterprise workflows
  • Multi-step reasoning and planning

The company claims the model is "particularly strong at multi-step reasoning and planning" with "high-throughput inference designed for high-volume agent pipelines."

Architecture and Performance

The hybrid Transformer-Mamba architecture represents a departure from pure transformer designs. The MoE approach activates only 55 billion of the total 550 billion parameters per inference call, reducing computational requirements while maintaining model capacity.

The 1-million token context window puts Nemotron 3 Ultra in the extended-context tier alongside models like Anthropic's Claude 3.5 Sonnet (200K) and Google's Gemini 1.5 Pro (2M tokens), though specific benchmark scores have not been disclosed.

Availability

Nemotron 3 Ultra is part of Nvidia's Nemotron family of open models for agentic AI. Model weights are available as a standard release, though Nvidia has not specified licensing terms or direct download locations beyond OpenRouter routing.

What This Means

Nemotron 3 Ultra targets a specific niche: long-context agentic workflows where extended reasoning and orchestration matter more than raw speed. The hybrid Transformer-Mamba architecture and MoE design suggest Nvidia is prioritizing inference efficiency for sustained agent operations over single-query performance. However, without published benchmark scores or independent testing, its actual capabilities relative to established models remain unverified. The pricing sits in the mid-range for frontier models, making it cost-competitive for high-volume agent deployments if performance claims hold.

Related Articles

model release

NVIDIA Nemotron 3 Ultra launches on AWS SageMaker with 550B parameters, 1M token context window

NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart with 550 billion total parameters and 55 billion active parameters. The model features a hybrid Transformer-Mamba Mixture-of-Experts architecture and supports context windows up to 1 million tokens, targeting agentic AI workloads.

model release

Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window

Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts model with 55B active parameters and a 1M token context window. The model uses a hybrid Transformer-Mamba architecture and is available for free through OpenRouter, targeting agentic workflows and multi-step reasoning tasks.

model release

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.

model release

NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning

NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.

Comments

Loading...