model releaseNVIDIA

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

TL;DR

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.

April 28, 2026 · 4:36 PM2 min read

Nemotron 3 Nano Omni — Quick Specs

Context window256K tokens

Compare Nemotron 3 Nano Omni with other models →

Nvidia Releases Nemotron 3 Nano Omni: Free 30B Multimodal Model

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model now available free through OpenRouter. The model supports a 256,000-token context window and accepts text, image, video, and audio inputs while producing text output.

Architecture and Performance Claims

According to Nvidia, Nemotron 3 Nano Omni is built on a hybrid MoE (Mixture of Experts) Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS). The company claims the model delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning compared to separate vision and speech pipelines.

The model designation "30B-A3B" indicates 30 billion total parameters with 3 billion active parameters per inference, typical of MoE architectures that activate only a subset of parameters per token.

Context and Reasoning Capabilities

The model supports up to 300,000 context length according to Nvidia's specifications, though OpenRouter lists 256,000 tokens. It includes a 16,384 reasoning budget and supports extended thinking via a reasoning.enabled parameter on OpenRouter.

This reasoning capability allows the model to show step-by-step thinking processes, with reasoning details accessible through the API response. OpenRouter's implementation preserves complete reasoning details when continuing conversations.

Availability and Pricing

Nemotron 3 Nano Omni is available now on OpenRouter at $0 per million input tokens and $0 per million output tokens. The model was released on April 28, 2025, according to OpenRouter's listing.

Nvidia positions the model as "a perception and context sub-agent in enterprise agent systems," designed to enable agents to perceive and reason across modalities in a single inference loop.

Technical Integration

The model is accessible through OpenRouter's API, which normalizes requests and responses across providers. Developers can use OpenRouter's SDK, OpenAI-compatible SDK, or raw API calls to integrate the model.

What This Means

Nemotron 3 Nano Omni represents Nvidia's entry into freely available multimodal models, competing directly with open-source alternatives. The hybrid MoE architecture suggests an efficiency-focused design, though independent benchmarks are needed to verify Nvidia's throughput and compute claims. The free pricing removes barriers for developers experimenting with multimodal enterprise agent systems, potentially accelerating adoption of video and audio understanding in production applications. The model's positioning as a "sub-agent" indicates Nvidia envisions it as a component in larger agent architectures rather than a standalone general-purpose model.

Source: openrouter.ai ↗

nvidia nemotron multimodal moe video-understanding audio-understanding reasoning free-model

model releaseApril 28, 2026

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.

model releaseApril 28, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 30B-A3B Multimodal Model With 100+ Page Document Support

NVIDIA released Nemotron 3 Nano Omni, a 30B-A3B Mixture-of-Experts model that processes text, images, video, and audio. The model uses a hybrid Mamba-Transformer architecture with 128 experts and achieves 65.8 on OCRBenchV2-En and 72.2 on Video-MME, while delivering up to 9x higher throughput on multimodal tasks compared to alternatives.

model releaseApril 27, 2026

Alibaba Releases Qwen3.6 Max Preview: 1 Trillion Parameter MoE Model With 262K Context Window

Alibaba Cloud has released Qwen3.6 Max Preview, a proprietary frontier model built on sparse mixture-of-experts architecture with approximately 1 trillion total parameters. The model supports a 262,144-token context window and features integrated thinking mode for multi-turn reasoning, priced at $1.30 per million input tokens and $7.80 per million output tokens.