product updateNVIDIA

NVIDIA Nemotron 3 Nano now available on Amazon Bedrock as serverless model

TL;DR

Amazon Bedrock now offers NVIDIA's Nemotron 3 Nano as a fully managed serverless model, expanding its Nemotron portfolio alongside previously available Nemotron 2 Nano 9B and Nemotron 2 Nano VL 12B variants. The addition enables developers to deploy NVIDIA's smallest inference-optimized model without managing infrastructure.

March 9, 2026 · 9:05 PM2 min read

NVIDIA Nemotron 3 Nano Now Available on Amazon Bedrock

Amazon Bedrock has added NVIDIA's Nemotron 3 Nano to its model catalog as a fully managed serverless offering, AWS announced. The model joins previously available Nemotron 2 Nano variants—the 9B base model and 12B vision-language model—expanding options for developers deploying compact inference models.

Model Availability and Architecture

Nemotron 3 Nano represents NVIDIA's latest generation of efficiency-focused language models designed for resource-constrained deployments. The model is now accessible through Bedrock's serverless inference interface, eliminating the need for customers to provision, manage, or scale infrastructure.

The Nemotron family includes:

Nemotron 2 Nano 9B: General-purpose language model
Nemotron 2 Nano VL 12B: Vision-language variant supporting multimodal inputs
Nemotron 3 Nano: Latest iteration (specific parameter count and context window not disclosed)

Use Cases and Integration

According to AWS, Nemotron 3 Nano targets applications requiring cost-effective inference at scale, including:

Custom model fine-tuning through Bedrock's managed training capabilities
Real-time text generation with minimal latency requirements
Deployment patterns where model size and inference speed are prioritized over maximum capability

The serverless architecture abstracts infrastructure management, allowing developers to focus on application logic rather than container orchestration or GPU provisioning.

Technical Guidance

AWS has published technical documentation covering:

Model invocation through the Bedrock API
Cost optimization strategies for production workloads
Integration patterns with applications requiring custom orchestration

Pricing specifics for Nemotron 3 Nano on Bedrock have not yet been disclosed. AWS typically charges based on input/output token volume for serverless models, with rates varying by model size and inference complexity.

Strategic Context

The addition reflects AWS's broader effort to expand Bedrock's model portfolio beyond proprietary offerings. Previous announcements included Anthropic's Claude family, Meta's Llama models, and open-weight options through partnerships. NVIDIA's Nemotron series complements this strategy by providing a minimalist option for cost-sensitive production workloads.

Bedrock's serverless model approach competes with alternative deployment paths:

Self-hosted inference via EC2 or container services
Alternative managed endpoints like SageMaker
Direct NVIDIA deployment tools like NVIDIA NIM

What This Means

The Nemotron 3 Nano availability lowers barriers to deploying compact models in production AWS environments. For organizations already using Bedrock, adding another model option reduces vendor lock-in and enables workload-specific model selection. However, Nemotron 3 Nano's technical specifications—parameter count, context window, training cutoff date—remain undisclosed, limiting detailed capability comparison with alternatives like Llama 3.2 1B or other sub-10B models available on Bedrock.

Source: aws.amazon.com ↗

amazon-bedrock nvidia nemotron serverless aws language-models inference

model releaseApril 28, 2026

NVIDIA Nemotron 3 Nano Omni: 30B-parameter multimodal model launches on AWS SageMaker with 131K token context

NVIDIA has launched Nemotron 3 Nano Omni on Amazon SageMaker JumpStart, a multimodal model with 30 billion total parameters (3 billion active) that processes video, audio, images, and text in a single inference pass. The model features a 131K token context window and uses a Mamba2 Transformer Hybrid MoE architecture combining three specialized encoders.

model releaseApril 29, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

product updateApril 29, 2026

AWS Launches Serverless MCP Proxy on Bedrock AgentCore Runtime for Custom Agent Controls

AWS has released support for custom Model Context Protocol (MCP) proxies on Amazon Bedrock AgentCore Runtime, allowing organizations to implement custom governance and security controls on AI agent tool interactions without modifying upstream MCP servers. The serverless proxy runs on AgentCore Runtime with automatic scaling and built-in observability through CloudWatch and OpenTelemetry.