Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter
Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.
Nemotron-3-Nano-Omni-30B-A3B — Quick Specs
Nvidia Releases Nemotron 3 Nano Omni: Free 30B Multimodal Model
Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model now available free through OpenRouter. The model supports a 256,000-token context window and accepts text, image, video, and audio inputs while producing text output.
Architecture and Performance Claims
According to Nvidia, Nemotron 3 Nano Omni is built on a hybrid MoE (Mixture of Experts) Transformer-Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS). The company claims the model delivers approximately 2× higher throughput and 2.5× lower compute for video reasoning compared to separate vision and speech pipelines.
The model designation "30B-A3B" indicates 30 billion total parameters with 3 billion active parameters per inference, typical of MoE architectures that activate only a subset of parameters per token.
Context and Reasoning Capabilities
The model supports up to 300,000 context length according to Nvidia's specifications, though OpenRouter lists 256,000 tokens. It includes a 16,384 reasoning budget and supports extended thinking via a reasoning.enabled parameter on OpenRouter.
This reasoning capability allows the model to show step-by-step thinking processes, with reasoning details accessible through the API response. OpenRouter's implementation preserves complete reasoning details when continuing conversations.
Availability and Pricing
Nemotron 3 Nano Omni is available now on OpenRouter at $0 per million input tokens and $0 per million output tokens. The model was released on April 28, 2025, according to OpenRouter's listing.
Nvidia positions the model as "a perception and context sub-agent in enterprise agent systems," designed to enable agents to perceive and reason across modalities in a single inference loop.
Technical Integration
The model is accessible through OpenRouter's API, which normalizes requests and responses across providers. Developers can use OpenRouter's SDK, OpenAI-compatible SDK, or raw API calls to integrate the model.
What This Means
Nemotron 3 Nano Omni represents Nvidia's entry into freely available multimodal models, competing directly with open-source alternatives. The hybrid MoE architecture suggests an efficiency-focused design, though independent benchmarks are needed to verify Nvidia's throughput and compute claims. The free pricing removes barriers for developers experimenting with multimodal enterprise agent systems, potentially accelerating adoption of video and audio understanding in production applications. The model's positioning as a "sub-agent" indicates Nvidia envisions it as a component in larger agent architectures rather than a standalone general-purpose model.
Related Articles
Nex AGI Releases Nex-N2-Pro: 17B Active Parameter MoE Model with 262K Context Window
Nex AGI has released Nex-N2-Pro, a mixture-of-experts model with 17 billion active parameters from a total of 397 billion parameters. Built on the Qwen3.5 architecture, the model offers a 262,144 token context window and is available for free through OpenRouter.
Nex AGI Releases Nex-N2-Pro: 397B Parameter MoE Model With 262K Context, Available Free
Nex AGI has released Nex-N2-Pro, an agentic mixture-of-experts model with 397B total parameters and 17B active parameters. The model features a 262K token context window and is available free via OpenRouter's API.
Nvidia releases Nemotron 3 Ultra: 550B-parameter MoE model with 1M context window for agentic workflows
Nvidia has released Nemotron 3 Ultra, a 550-billion parameter mixture-of-experts model with 55 billion active parameters and support for up to 1 million token context windows. The model uses a hybrid Transformer-Mamba architecture and is designed specifically for long-running agentic workflows including agent orchestration, coding agents, and complex enterprise tasks.
Moonshot AI releases Kimi K2.7 Code with 1T parameters, 256K context window, 30% lower thinking token usage
Moonshot AI has released Kimi K2.7 Code, a 1 trillion parameter Mixture-of-Experts model designed for long-horizon coding tasks. The model features a 256K context window and reduces thinking token usage by approximately 30% compared to its predecessor K2.6.
Comments
Loading...