model releaseNVIDIA

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

TL;DR

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B, a multimodal large language model with 31 billion parameters using a Mamba2-Transformer hybrid Mixture of Experts architecture. The model supports video, audio, image, and text inputs with a 256K token context window and includes a dedicated reasoning mode with chain-of-thought capabilities.

2 min read
0

NVIDIA Releases Nemotron 3 Nano Omni: 31B-Parameter Multimodal Model with 256K Context and Reasoning Mode

NVIDIA has released Nemotron 3 Nano Omni 30B-A3B-Reasoning, a multimodal large language model with 31 billion parameters (30B active, 3B active per token in its MoE architecture). The model launched on April 28, 2026, on Hugging Face, Build.Nvidia.com, and NGC.

Architecture and Capabilities

Nemotron 3 Nano Omni uses a Mamba2-Transformer hybrid Mixture of Experts (MoE) architecture, combining a 30B-A3B Nemotron 3 Nano LLM base with a CRADIO v4-H vision encoder and Parakeet speech encoder. The model supports:

  • Video: MP4 files up to 2 minutes, sampling at 1-2 FPS depending on resolution (up to 256 frames for 720p)
  • Audio: WAV and MP3 files up to 1 hour, 8kHz+ sampling rates
  • Images: JPEG and PNG formats
  • Text: English only
  • Context window: 256,000 tokens

The model includes two operating modes: a reasoning mode with chain-of-thought capabilities (using 16,384-token reasoning budget) and a standard instruct mode. It supports JSON output formatting, tool calling, and word-level timestamps for transcription tasks.

Training and Development

According to NVIDIA, the model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b, though specific training methodology details were not disclosed.

Deployment and Availability

The model is available in three precision formats:

  • BF16 (approximately 62 GB)
  • FP8
  • NVFP4 (NVIDIA's 4-bit format)

NVIDIA claims the model is optimized for NVIDIA Ampere, Hopper, Blackwell, and Lovelace architectures, with specific support for A100, H100, H200, B200, L40S, RTX Pro 6000 SE, and RTX 5090 GPUs. The model requires vLLM 0.20.0 for inference.

The model is released under the NVIDIA Open Model Agreement and is available for commercial use.

Target Use Cases

NVIDIA positions Nemotron 3 Nano Omni for enterprise applications including:

  • Customer service (video verification, drive-thru order processing)
  • Media and entertainment video analysis
  • Document intelligence for contracts and financial documents
  • GUI automation for AI agents
  • Meeting transcription and summarization

What This Means

Nemotron 3 Nano Omni represents NVIDIA's entry into the multimodal reasoning model category, directly competing with models like GPT-4o and Claude 3.5 Sonnet. The 256K context window and dedicated reasoning mode position it for enterprise document processing tasks. However, the English-only limitation and lack of disclosed benchmark scores make performance comparisons difficult. The MoE architecture at 31B parameters suggests efficiency goals, though actual inference costs and speeds on various hardware remain to be independently verified. The model's integration with vLLM and support for NVIDIA's NVFP4 quantization indicate a focus on deployment flexibility across NVIDIA's hardware ecosystem.

Related Articles

model release

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.

model release

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.

model release

NVIDIA Releases Nemotron 3 Nano Omni: 30B-A3B Multimodal Model With 100+ Page Document Support

NVIDIA released Nemotron 3 Nano Omni, a 30B-A3B Mixture-of-Experts model that processes text, images, video, and audio. The model uses a hybrid Mamba-Transformer architecture with 128 experts and achieves 65.8 on OCRBenchV2-En and 72.2 on Video-MME, while delivering up to 9x higher throughput on multimodal tasks compared to alternatives.

model release

Alibaba Releases Qwen3.6 Max Preview: 1 Trillion Parameter MoE Model With 262K Context Window

Alibaba Cloud has released Qwen3.6 Max Preview, a proprietary frontier model built on sparse mixture-of-experts architecture with approximately 1 trillion total parameters. The model supports a 262,144-token context window and features integrated thinking mode for multi-turn reasoning, priced at $1.30 per million input tokens and $7.80 per million output tokens.

Comments

Loading...