Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window
Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts model with 55B active parameters and a 1M token context window. The model uses a hybrid Transformer-Mamba architecture and is available for free through OpenRouter, targeting agentic workflows and multi-step reasoning tasks.
Nemotron 3 Ultra — Quick Specs
Nvidia Releases Nemotron 3 Ultra: 550B Parameter MoE Model with 1M Token Context Window
Nvidia has released Nemotron 3 Ultra, a 550B parameter mixture-of-experts (MoE) model with 55B active parameters and support for up to 1M token context windows. The model is available for free through OpenRouter.
Architecture and Specifications
Nemotron 3 Ultra uses a hybrid Transformer-Mamba mixture-of-experts architecture, with 55B parameters active during inference out of 550B total parameters. According to Nvidia, the model is designed for "high-throughput inference" in production agentic pipelines.
The model supports text-only input and output with a context window of 1M tokens. Nvidia states the model is "particularly strong at multi-step reasoning and planning."
Target Use Cases
Nvidia positions Nemotron 3 Ultra for:
- Agent orchestration
- Coding agents
- Deep research tasks
- Complex enterprise workflows
- Long-running agentic pipelines
The model is part of Nvidia's Nemotron family of open models focused on agentic AI applications.
Availability and Pricing
Nemotron 3 Ultra is available for free through OpenRouter, which routes requests to providers capable of handling the model's context window and parameters. Pricing per 1M tokens not disclosed for direct API access. Model weights are available, though specific hosting details were not provided.
The release date listed as June 4, 2026, appears to be an error in the source documentation.
What This Means
Nvidia's entry into the 1M+ context window space with a free MoE model intensifies competition in the agent-focused model segment. The hybrid Transformer-Mamba architecture represents a technical bet on alternatives to pure Transformer architectures for long-context scenarios. The 55B active parameter configuration suggests Nvidia is optimizing for inference efficiency over raw parameter count, though benchmark scores remain undisclosed. Free availability through OpenRouter lowers the barrier for developers building multi-step reasoning applications, potentially accelerating adoption in enterprise agent workflows.
Related Articles
NVIDIA Releases Nemotron 3.5 ASR: 600M-Parameter Streaming Speech Model for 40 Languages
NVIDIA released Nemotron 3.5 ASR, a 600M-parameter speech-to-text model supporting 40 language-locales from a single checkpoint. The model achieves 0.07 seconds to final transcript after speech ends and ranks 2nd in latency among streaming ASR models according to Artificial Analysis benchmarks.
NVIDIA Shows Task-Seeded Synthetic Data Boosts Nemotron-3 Nano by +11.1 on GPQA
NVIDIA demonstrated that task-seeded synthetic Q&A data improves model performance across multiple benchmarks in a 100B-token continuation experiment on Nemotron-3 Nano. The approach improved GPQA scores by +11.1 points, MMLU-Pro by +1.8, average code by +1.9, and commonsense understanding by +1.6.
JetBrains Releases Mellum2-12B Reasoning Model with 131K Context and Mixture-of-Experts Architecture
JetBrains has released Mellum2-12B-A2.5B-Thinking, a reasoning-augmented assistant model with 131,072-token context window and 64 Mixture-of-Experts architecture that activates 8 experts per token. The model emits explicit chain-of-thought reasoning inside <think> blocks before providing final answers.
Google DeepMind Releases Gemma 4: Encoder-Free Multimodal Models from 2.3B to 30.7B Parameters
Google DeepMind released Gemma 4, a family of open-weight multimodal models ranging from 2.3B to 30.7B parameters. The flagship 12B Unified model eliminates separate encoders, processing text, images, audio, and video directly through a single decoder-only transformer with up to 256K token context window.
Comments
Loading...