model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

TL;DR

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

May 2, 2026 · 9:06 PM2 min read

Nemotron-3-Nano-Omni-30B-A3B — Quick Specs

Context window256K tokens

Compare Nemotron-3-Nano-Omni-30B-A3B with other models →

NVIDIA Releases Nemotron-3-Nano-Omni-30B with Multimodal Processing and Reasoning Mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a 31 billion-parameter multimodal model that processes video, audio, images, and text with up to 256,000 token context length. The model is available commercially under the NVIDIA Open Model Agreement.

Architecture and Specifications

Nemotron-3-Nano-Omni uses a Mamba2-Transformer hybrid Mixture of Experts (MoE) architecture with 31B total parameters and 3B active parameters (A3B). The model combines three specialized encoders:

Nemotron 3 Nano LLM (30B A3B) for language processing
CRADIO v4-H vision encoder for image and video
Parakeet speech encoder for audio

The model accepts video files up to 2 minutes at 1 FPS (1080p) or 2 FPS (720p), audio files up to 1 hour, and images in JPEG/PNG format. It supports English only.

Key Capabilities

According to NVIDIA, the model provides:

Video and speech comprehension
GUI automation and OCR
Speech transcription with word-level timestamps
JSON output format support
Chain-of-thought reasoning mode with configurable reasoning budget (up to 16,384 tokens)
Tool calling capabilities

Training and Development

NVIDIA states the model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b. Specific training methodologies and benchmark scores were not disclosed.

Deployment Requirements

The model requires vLLM 0.20.0 and runs on NVIDIA Ampere, Hopper, Blackwell, and Lovelace GPUs. Available precision formats include BF16 (~62GB), FP8, and NVFP4. NVIDIA recommends 131,072 maximum model length for single-GPU deployment with tensor-parallel-size 1.

Recommended inference parameters vary by mode:

Thinking mode: temperature 0.6, top_p 0.95, max_tokens 20,480
Instruct mode: temperature 0.2, top_k 1, max_tokens 1,024

The model supports deployment on edge devices including Jetson Thor and consumer hardware like RTX 5090.

Availability

Nemotron-3-Nano-Omni-30B is available on Hugging Face, Build.Nvidia.com, and NGC as of April 28, 2026. Runtime engines include vLLM, TensorRT-LLM, NeMo Megatron, llama.cpp, Ollama, and SGLang.

What This Means

NVIDIA's release targets enterprise multimodal applications that require unified processing of video, audio, and documents—use cases that previously required multiple specialized models. The 256K context window and reasoning mode position it for complex document analysis and extended video processing. The commercial license and edge deployment support (including consumer RTX 5090) differentiate it from research-focused multimodal models, though pricing and comparative benchmarks against competitors like GPT-4V or Gemini were not provided.

Source: huggingface.co ↗

NVIDIA Nemotron multimodal MoE reasoning video understanding speech transcription OCR

model releaseApril 29, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.

model releaseApril 29, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

model releaseApril 28, 2026

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.