model releaseNVIDIA

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

TL;DR

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

April 29, 2026 · 5:36 PM2 min read

NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning — Quick Specs

Context window256K tokens

Compare NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning with other models →

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni on April 28, 2026, a 31B parameter multimodal model (30B active parameters, 3B per token) that processes video, audio, images, and text with a 256K token context window.

Model Architecture and Capabilities

The model uses a Mamba2-Transformer Hybrid Mixture of Experts (MoE) architecture, combining a Nemotron 3 Nano 30B LLM with CRADIO v4-H vision encoder and Parakeet speech encoder. According to NVIDIA, it supports video files up to 2 minutes (mp4, 1080p at 1 FPS/128 frames, 720p at 2 FPS/256 frames), audio files up to 1 hour (wav/mp3, 8kHz+ sampling), standard image formats (jpeg/png), and English text.

Key features include:

Reasoning mode with chain-of-thought output (reasoning budget: 16,384 tokens, grace period: 1,024 tokens)
JSON output format support
Tool calling functionality
Word-level timestamps for transcription
GUI and OCR capabilities

Availability and Hardware Requirements

The model is available in three precision formats on Hugging Face:

BF16 (~62GB)
FP8
NVFP4 (NVIDIA's 4-bit format)

NVIDIA specifies compatibility with Ampere (A100 80GB), Hopper (H100/H200), Blackwell (B200, RTX 5090, RTX Pro 6000 SE), and Lovelace (L40S) architectures. The model runs on vLLM 0.20.0, TensorRT LLM, llama.cpp, Ollama, and SGLang runtimes.

Training and Commercial Use

According to NVIDIA, the model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b. The model is available for commercial use under the NVIDIA Open Model Agreement.

NVIDIA targets enterprise use cases including customer service (video verification, OCR), media and entertainment (video analysis, dense captions), document intelligence (contracts, financial documents), and GUI automation for agentic applications.

Deployment Configuration

For single-GPU deployment (B200), NVIDIA recommends:

Thinking mode: temperature 0.6, top_p 0.95, max_tokens 20,480
Instruct mode: temperature 0.2, top_k 1, max_tokens 1,024
Maximum model length: 131,072 tokens (expandable to full 256K context)
FP8 KV cache for memory efficiency

The vLLM configuration supports up to 384 concurrent sequences with --max-num-seqs parameter.

What This Means

Nemotron 3 Nano Omni represents NVIDIA's push into unified multimodal processing for enterprise applications, directly competing with GPT-4V and Gemini 1.5 in video understanding. The 256K context window and 2-minute video support enable processing of full meeting recordings and training videos without chunking. The MoE architecture (3B active per token from 30B total) provides efficiency gains over dense models, though real-world performance benchmarks against competitors remain to be published. The reasoning mode positions it against o1-preview/o3-mini for tasks requiring step-by-step problem solving, while tool calling and JSON output support agentic workflows. Notably, NVIDIA provides GGUF quantizations via Unsloth for local deployment, expanding accessibility beyond datacenter GPUs to RTX 5090 and similar consumer hardware.

Source: huggingface.co ↗

NVIDIA Nemotron multimodal video understanding reasoning MoE enterprise AI vLLM

model releaseApril 29, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.

model releaseApril 28, 2026

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.

model releaseApril 28, 2026

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.

model releaseApril 28, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 30B-A3B Multimodal Model With 100+ Page Document Support

NVIDIA released Nemotron 3 Nano Omni, a 30B-A3B Mixture-of-Experts model that processes text, images, video, and audio. The model uses a hybrid Mamba-Transformer architecture with 128 experts and achieves 65.8 on OCRBenchV2-En and 72.2 on Video-MME, while delivering up to 9x higher throughput on multimodal tasks compared to alternatives.

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning — Quick Specs

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

Model Architecture and Capabilities

Availability and Hardware Requirements

Training and Commercial Use

Deployment Configuration

What This Means

Related Articles

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA Releases Nemotron 3 Nano Omni: 30B-A3B Multimodal Model With 100+ Page Document Support

Comments