model releaseNVIDIA

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

TL;DR

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

2 min read
0

NVIDIA Releases Nemotron-3-Nano-Omni-30B with Multimodal Processing and Reasoning Mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a 31 billion-parameter multimodal model that processes video, audio, images, and text with up to 256,000 token context length. The model is available commercially under the NVIDIA Open Model Agreement.

Architecture and Specifications

Nemotron-3-Nano-Omni uses a Mamba2-Transformer hybrid Mixture of Experts (MoE) architecture with 31B total parameters and 3B active parameters (A3B). The model combines three specialized encoders:

  • Nemotron 3 Nano LLM (30B A3B) for language processing
  • CRADIO v4-H vision encoder for image and video
  • Parakeet speech encoder for audio

The model accepts video files up to 2 minutes at 1 FPS (1080p) or 2 FPS (720p), audio files up to 1 hour, and images in JPEG/PNG format. It supports English only.

Key Capabilities

According to NVIDIA, the model provides:

  • Video and speech comprehension
  • GUI automation and OCR
  • Speech transcription with word-level timestamps
  • JSON output format support
  • Chain-of-thought reasoning mode with configurable reasoning budget (up to 16,384 tokens)
  • Tool calling capabilities

Training and Development

NVIDIA states the model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b. Specific training methodologies and benchmark scores were not disclosed.

Deployment Requirements

The model requires vLLM 0.20.0 and runs on NVIDIA Ampere, Hopper, Blackwell, and Lovelace GPUs. Available precision formats include BF16 (~62GB), FP8, and NVFP4. NVIDIA recommends 131,072 maximum model length for single-GPU deployment with tensor-parallel-size 1.

Recommended inference parameters vary by mode:

  • Thinking mode: temperature 0.6, top_p 0.95, max_tokens 20,480
  • Instruct mode: temperature 0.2, top_k 1, max_tokens 1,024

The model supports deployment on edge devices including Jetson Thor and consumer hardware like RTX 5090.

Availability

Nemotron-3-Nano-Omni-30B is available on Hugging Face, Build.Nvidia.com, and NGC as of April 28, 2026. Runtime engines include vLLM, TensorRT-LLM, NeMo Megatron, llama.cpp, Ollama, and SGLang.

What This Means

NVIDIA's release targets enterprise multimodal applications that require unified processing of video, audio, and documents—use cases that previously required multiple specialized models. The 256K context window and reasoning mode position it for complex document analysis and extended video processing. The commercial license and edge deployment support (including consumer RTX 5090) differentiate it from research-focused multimodal models, though pricing and comparative benchmarks against competitors like GPT-4V or Gemini were not provided.

Related Articles

model release

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.

model release

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

model release

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.

model release

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.

Comments

Loading...

NVIDIA Nemotron-3-Nano-Omni-30B: Multimodal Model with 256K Context | TPS