Nemotron

8 articles tagged with Nemotron

June 4, 2026
model releaseNVIDIA

NVIDIA Releases Nemotron 3.5 ASR: 600M-Parameter Streaming Speech Model for 40 Languages

NVIDIA released Nemotron 3.5 ASR, a 600M-parameter speech-to-text model supporting 40 language-locales from a single checkpoint. The model achieves 0.07 seconds to final transcript after speech ends and ranks 2nd in latency among streaming ASR models according to Artificial Analysis benchmarks.

May 22, 2026
model releaseNVIDIA

NVIDIA releases Nemotron-Labs-Diffusion-14B with tri-mode decoding achieving 3.3x speed-up on GB200

NVIDIA released Nemotron-Labs-Diffusion-14B, a 14-billion parameter language model that supports three decoding modes by switching attention patterns during inference. The model achieves 850 tokens per second on GB200 hardware at concurrency 1, representing a 3.3x speed-up over standard autoregressive decoding and outperforming Qwen3-8B-Eagle3 by 2.2x in self-speculation mode.

May 2, 2026
model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

April 29, 2026
model releaseNVIDIA

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

model releaseNVIDIA+1

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.

April 28, 2026
model releaseNVIDIA

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.

model releaseNVIDIA

NVIDIA Releases Nemotron 3 Nano Omni: 30B-A3B Multimodal Model With 100+ Page Document Support

NVIDIA released Nemotron 3 Nano Omni, a 30B-A3B Mixture-of-Experts model that processes text, images, video, and audio. The model uses a hybrid Mamba-Transformer architecture with 128 experts and achieves 65.8 on OCRBenchV2-En and 72.2 on Video-MME, while delivering up to 9x higher throughput on multimodal tasks compared to alternatives.

March 23, 2026
model releaseNVIDIA+1

NVIDIA releases Nemotron-3-Nano-4B, a 4B parameter model for edge AI with 262K context window

NVIDIA released Nemotron-3-Nano-4B-GGUF on March 16, 2026, a 4-billion parameter small language model (SLM) designed for edge deployment on devices like Jetson Thor and GeForce RTX. The model features a hybrid Mamba-2 and Transformer architecture with a 262K token context window and supports both reasoning and non-reasoning modes via system prompts.