model releaseNVIDIA

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

TL;DR

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.

April 28, 2026 · 4:51 PM2 min read

Nemotron 3 Nano Omni — Quick Specs

Context window131K tokens

Compare Nemotron 3 Nano Omni with other models →

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters and 3 billion active parameters that processes video, audio, images, and text in a single inference pass.

Technical specifications

The model uses a Mamba2 Transformer Hybrid Mixture of Experts (MoE) architecture combining three components:

Nemotron 3 Nano LLM: Language backbone
CRADIO v4-H: Vision encoder for image and video understanding
Parakeet: Speech encoder for audio transcription

Key specifications:

Context window: 131,072 tokens
Total parameters: 30 billion
Active parameters: 3 billion (MoE)
Precision: FP8 on SageMaker
Video support: Up to 2 minutes, up to 256 frames (MP4)
Audio support: Up to 1 hour, 8kHz+ sampling rate (WAV, MP3)
Image formats: JPEG, PNG (RGB)

The model supports chain-of-thought reasoning, tool calling, JSON output, and word-level timestamps for transcription tasks. It is licensed under the NVIDIA Open Model Agreement for commercial use.

Architecture approach

According to AWS and NVIDIA, the unified architecture addresses a common pain point in enterprise AI systems: most agentic workflows currently stitch together separate models for vision, speech, and language. This fragmented approach increases latency through repeated inference passes, complicates orchestration, and amplifies costs.

Nemotron 3 Nano Omni processes all modalities in a single reasoning loop, eliminating the need for multiple model calls and maintaining converged multimodal context across reasoning loops.

Deployment and inference

The model is available through Amazon SageMaker JumpStart with one-click deployment. AWS recommends deploying on ml.p4d.24xlarge or ml.p5.48xlarge instances.

Recommended inference parameters vary by mode:

Thinking mode (complex reasoning): temperature 0.6, top_p 0.95, max_tokens 20,480
Instruct mode (general tasks, ASR): temperature 0.2, max_tokens 1,024

Enterprise applications

NVIDIA and AWS highlight several use cases:

Computer use agents: Reading screens, understanding UI state over time, and validating outcomes for incident management dashboards, browser automation, and email workflow agents.

Document intelligence: Interpreting contracts, financial documents, and scientific literature with mixed visual and text content.

Audio and video understanding: Meeting recording analysis, media asset management, drive-thru order verification, and customer service video review.

What this means

Nemotron 3 Nano Omni represents NVIDIA's entry into the unified multimodal model space, directly competing with offerings like GPT-4V and Gemini. The 131K context window is competitive but not leading—Claude 3.5 Sonnet offers 200K tokens, and Gemini 1.5 Pro supports up to 2 million tokens. The MoE architecture with 3B active parameters aims to reduce inference costs while maintaining capability, though pricing per million tokens was not disclosed. The key differentiation is the single-pass multimodal processing specifically optimized for agentic workflows, which could reduce orchestration complexity for enterprises building AI agents that need to process multiple input types simultaneously.

Source: aws.amazon.com ↗

NVIDIA Nemotron multimodal AWS SageMaker MoE video-understanding audio-transcription

model releaseApril 28, 2026

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

Nvidia has released Nemotron 3 Nano Omni, a 30-billion-parameter multimodal model available free on OpenRouter. The model features a 256,000-token context window, accepts text, image, video, and audio inputs, and claims 2× higher throughput for video reasoning compared to separate vision and speech pipelines.

model releaseApril 28, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 30B-A3B Multimodal Model With 100+ Page Document Support

NVIDIA released Nemotron 3 Nano Omni, a 30B-A3B Mixture-of-Experts model that processes text, images, video, and audio. The model uses a hybrid Mamba-Transformer architecture with 128 experts and achieves 65.8 on OCRBenchV2-En and 72.2 on Video-MME, while delivering up to 9x higher throughput on multimodal tasks compared to alternatives.

model releaseApril 28, 2026

Xiaomi releases MiMo-V2.5: 310B parameter omnimodal model with 1M token context window

Xiaomi released MiMo-V2.5, a 310B total parameter sparse mixture-of-experts model that activates 15B parameters per token. The omnimodal model supports text, image, video, and audio understanding with a 1M token context window and was trained on 48T tokens using FP8 mixed precision.

model releaseApril 27, 2026

Xiaomi Releases MiMo-V2.5-Pro: 1.02T Parameter MoE Model with 1M Context Window

Xiaomi has released MiMo-V2.5-Pro, an open-source Mixture-of-Experts model with 1.02 trillion total parameters and 42 billion active parameters. The model supports up to 1 million tokens context length and claims 99.6% on GSM8K and 86.2% on MATH benchmarks.

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

Nemotron 3 Nano Omni — Quick Specs

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

Technical specifications

Architecture approach

Deployment and inference

Enterprise applications

What this means

Related Articles

Nvidia releases Nemotron 3 Nano Omni: 30B-parameter multimodal model with 256K context, free on OpenRouter

NVIDIA Releases Nemotron 3 Nano Omni: 30B-A3B Multimodal Model With 100+ Page Document Support

Xiaomi releases MiMo-V2.5: 310B parameter omnimodal model with 1M token context window

Xiaomi Releases MiMo-V2.5-Pro: 1.02T Parameter MoE Model with 1M Context Window

Comments