NVIDIA

GPU maker and AI infrastructure provider

https://nvidia.com

News

model releaseNVIDIA

Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows

Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.

2 min read
model releaseNVIDIA

NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning

NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.

2 min read
model releaseNVIDIA

NVIDIA releases Nemotron-3-Ultra: 550B parameter model with 1M token context and configurable reasoning

NVIDIA released Nemotron-3-Ultra-550B, a frontier-scale model with 550B total parameters (55B active) and up to 1M token context window. The model uses a hybrid LatentMoE architecture combining Mamba-2, MoE, and attention layers with Multi-Token Prediction, trained with NVFP4 quantization-aware methods from December 2025 to April 2026.

2 min read
model releaseNVIDIA

NVIDIA Releases Nemotron 3.5 Content Safety: 4B-Parameter Multimodal Model with Custom Policy Enforcement and 140-Langua

NVIDIA has released Nemotron 3.5 Content Safety, a 4B-parameter model built on Google Gemma 3 4B IT that provides multimodal safety classification across approximately 140 languages. The model includes a 128K context window, custom enterprise policy enforcement, auditable reasoning traces, and is releasing its training dataset.

3 min read
model releaseNVIDIA

NVIDIA Releases Cosmos3-Super-Text2Image: 64B Parameter Model for Physical AI Applications

NVIDIA released Cosmos3-Super-Text2Image, a 64-billion parameter text-to-image generation model as part of its Cosmos3 collection of omnimodal world models. The model uses a Mixture-of-Transformers architecture combining autoregressive and diffusion transformers, designed for Physical AI applications including robotics and autonomous vehicles.

2 min read
model releaseNVIDIA

NVIDIA Releases Cosmos 3: 64B-Parameter Omnimodal World Model for Physical AI

NVIDIA released Cosmos 3, an omnimodal world foundation model platform for Physical AI spanning robotics, autonomous driving, and industrial environments. The flagship Cosmos3-Super variant contains 64 billion parameters and generates video, images, audio, and action commands from text, image, video, and action trajectory inputs using a Mixture-of-Transformers architecture.

2 min read
model releaseNVIDIA

NVIDIA Releases Cosmos3-Super: 64B-Parameter Omnimodal World Model for Physical AI

NVIDIA released Cosmos3-Super, a 64-billion parameter omnimodal foundation model that generates video, images, audio, and action commands from combinations of text, image, video, and action trajectory inputs. The model, part of the Cosmos3 collection, targets Physical AI applications including robotics, autonomous vehicles, and industrial automation.

2 min read
model releaseNVIDIA

NVIDIA Releases Cosmos3-Nano: 16B-Parameter Omnimodal World Model for Physical AI with 256K Token Context

NVIDIA has released Cosmos3-Nano, a 16-billion parameter omnimodal world model capable of generating video, audio, images, and robot action commands from combinations of text, image, video, and action trajectory inputs. The model supports a 256K token context window and is designed for Physical AI applications including robotics, autonomous vehicles, and smart manufacturing environments.

2 min read
model releaseNVIDIA

NVIDIA Releases Cosmos 3: 8B and 32B Omni-Models Combining Video Generation, Reasoning, and Action in Single Architectur

NVIDIA has released Cosmos 3, a unified omni-model that combines world generation, physical reasoning, and action generation in a single architecture. Available in 8B (Nano) and 32B (Super) parameter versions on Hugging Face, Cosmos 3 uses a Mixture-of-Transformers architecture to process text, image, video, audio, and action modalities without switching between separate models.

2 min read
model releaseNVIDIA

NVIDIA releases LocateAnything-3B vision-language model with 2.5× faster object detection via parallel box decoding

NVIDIA released LocateAnything-3B, a 3-billion parameter vision-language model that predicts bounding boxes in parallel rather than token-by-token, achieving up to 2.5× higher throughput compared to autoregressive approaches. The model, trained on 12M images with 138M+ queries and 785M bounding boxes, supports object detection, GUI element grounding, and robotics perception.

2 min read
researchNVIDIA

NVIDIA Releases Nemotron-Labs Diffusion Models With 6.4× Faster Token Generation Than Autoregressive Decoding

NVIDIA has released Nemotron-Labs Diffusion, a family of diffusion language models at 3B, 8B, and 14B scales that generate multiple tokens in parallel rather than one at a time. The 8B model achieves 6.4× higher tokens per forward pass than autoregressive models in self-speculation mode while maintaining comparable accuracy.

2 min read
model releaseNVIDIA

NVIDIA releases Nemotron-Labs-Diffusion-14B with tri-mode decoding achieving 3.3x speed-up on GB200

NVIDIA released Nemotron-Labs-Diffusion-14B, a 14-billion parameter language model that supports three decoding modes by switching attention patterns during inference. The model achieves 850 tokens per second on GB200 hardware at concurrency 1, representing a 3.3x speed-up over standard autoregressive decoding and outperforming Qwen3-8B-Eagle3 by 2.2x in self-speculation mode.

2 min read

Models

Llama 3.1 Nemotron 70B Instruct

NVIDIA

active
Context128K
Input/1M$0.2

Nemotron 3 Ultra

NVIDIA

active
Context1000K
Input/1M$0.5

Jun 5, 2026

NVIDIA Nemotron 3 Ultra

NVIDIA

active
Context1000K

Jun 4, 2026

Nemotron 3.5 ASR

NVIDIA

active

Jun 4, 2026

Nemotron 3.5 Content Safety

NVIDIA

active
Context128K
0

Jun 4, 2026

Nemotron 3.5 Content Safety

NVIDIA

active
Context128K

Jun 4, 2026

Nemotron-3-Ultra-550B-A55B

NVIDIA

active
Context1000K

Jun 4, 2026

Cosmos3-Nano

NVIDIA

active
Context256K

Jun 2, 2026

Cosmos 3 Super

NVIDIA

active

Jun 1, 2026

NVIDIA Cosmos3-Super

NVIDIA

active
Context256K

May 31, 2026

Cosmos 3 Super Image2Video

NVIDIA

active
Context262K

May 31, 2026

Cosmos3-Super-Text2Image

NVIDIA

active
Context4K

May 31, 2026

LocateAnything-3B

NVIDIA

active
Context24K

May 26, 2026

Nemotron-Labs Diffusion 8B

NVIDIA

active

May 23, 2026

Nemotron 3 Nano Omni

NVIDIA

active
Context131K

Apr 28, 2026

Nemotron 3 Nano Omni 30B-A3B-Reasoning

NVIDIA

active
Context256K

Apr 28, 2026

NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning

NVIDIA

active
Context256K

Apr 28, 2026

Nemotron-3-Nano-Omni-30B-A3B

NVIDIA

active
Context256K

Apr 28, 2026

NVIDIA Isaac GR00T N1.7

NVIDIA

active

Apr 17, 2026

Gemma 4 31B IT NVFP4

NVIDIA

active
Context262K

Apr 2, 2026

gpt-oss-puzzle-88B

NVIDIA

active
Context128K

Mar 26, 2026

NVIDIA Nemotron-3-Nano-4B-GGUF

NVIDIA

active
Context262K

Mar 16, 2026

Nemotron 3 Super

NVIDIA

active
Context1000K
Input/1M$0.1

Mar 11, 2026

NVIDIA Nemotron-3-Super-120B-A12B

NVIDIA

active
Context1000K
Input/1M$0.2

Mar 10, 2026

Nemotron 3 Content Safety 4B

NVIDIA

active
Context128K

Mar 20, 2025

NVIDIA Nemotron 3 Super

NVIDIA

active
Context256K

Jan 8, 2025

Nvidia Llama 3.1 Nemotron 70B Instruct

NVIDIA

active
Context131K
Input/1M$1.2

Oct 15, 2024

Top Benchmark Scores

Full leaderboard →
70.63%
865 tokens_per_sec

SWE-bench Verified

Nemotron-3-Ultra-550B-A55B
71.9%