Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter
Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.
Nemotron 3 Nano Omni — Quick Specs
Nvidia Releases Nemotron 3 Nano Omni: Free 30B Multimodal Model
Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model now available free through OpenRouter. The model supports a 256,000-token context window and accepts text, image, video, and audio inputs while producing text output.
Architecture and Performance Claims
According to Nvidia, Nemotron 3 Nano Omni is built on a hybrid MoE (Mixture of Experts) Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS). The company claims the model delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning compared to separate vision and speech pipelines.
The model designation "30B-A3B" indicates 30 billion total parameters with 3 billion active parameters per inference, typical of MoE architectures that activate only a subset of parameters per token.
Context and Reasoning Capabilities
The model supports up to 300,000 context length according to Nvidia's specifications, though OpenRouter lists 256,000 tokens. It includes a 16,384 reasoning budget and supports extended thinking via a reasoning.enabled parameter on OpenRouter.
This reasoning capability allows the model to show step-by-step thinking processes, with reasoning details accessible through the API response. OpenRouter's implementation preserves complete reasoning details when continuing conversations.
Availability and Pricing
Nemotron 3 Nano Omni is available now on OpenRouter at $0 per million input tokens and $0 per million output tokens. The model was released on April 28, 2025, according to OpenRouter's listing.
Nvidia positions the model as "a perception and context sub-agent in enterprise agent systems," designed to enable agents to perceive and reason across modalities in a single inference loop.
Technical Integration
The model is accessible through OpenRouter's API, which normalizes requests and responses across providers. Developers can use OpenRouter's SDK, OpenAI-compatible SDK, or raw API calls to integrate the model.
What This Means
Nemotron 3 Nano Omni represents Nvidia's entry into freely available multimodal models, competing directly with open-source alternatives. The hybrid MoE architecture suggests an efficiency-focused design, though independent benchmarks are needed to verify Nvidia's throughput and compute claims. The free pricing removes barriers for developers experimenting with multimodal enterprise agent systems, potentially accelerating adoption of video and audio understanding in production applications. The model's positioning as a "sub-agent" indicates Nvidia envisions it as a component in larger agent architectures rather than a standalone general-purpose model.
Related Articles
NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context
NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.
NVIDIA Releases Nemotron 3 Nano Omni: 30B-A3B Multimodal Model With 100+ Page Document Support
NVIDIA released Nemotron 3 Nano Omni, a 30B-A3B Mixture-of-Experts model that processes text, images, video, and audio. The model uses a hybrid Mamba-Transformer architecture with 128 experts and achieves 65.8 on OCRBenchV2-En and 72.2 on Video-MME, while delivering up to 9x higher throughput on multimodal tasks compared to alternatives.
Alibaba Releases Qwen3.6 Max Preview: 1 Trillion Parameter MoE Model With 262K Context Window
Alibaba Cloud has released Qwen3.6 Max Preview, a proprietary frontier model built on sparse mixture-of-experts architecture with approximately 1 trillion total parameters. The model supports a 262,144-token context window and features integrated thinking mode for multi-turn reasoning, priced at $1.30 per million input tokens and $7.80 per million output tokens.
Alibaba's Qwen Team Releases Qwen3.6 27B With 262K Context Window and Video Processing
Alibaba's Qwen Team has released Qwen3.6 27B, a 27-billion parameter multimodal language model with a 262,144-token context window. The model accepts text, image, and video inputs and includes a built-in thinking mode for extended reasoning, with pricing at $0.195 per million input tokens and $1.56 per million output tokens.
Comments
Loading...