product updateNVIDIA

NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window

TL;DR

NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.

2 min read
0

NVIDIA Nemotron 3 Super launches on Amazon Bedrock

NVIDIA Nemotron 3 Super is now available as a fully managed, serverless model on Amazon Bedrock, joining the existing Nemotron Nano offerings. The model uses a hybrid Mixture of Experts (MoE) architecture optimized for agentic AI systems and multi-agent workflows without requiring infrastructure management.

Model specifications

Nemotron 3 Super is a 120B parameter model with 12B active parameters per token, using a latent MoE design that enables 4x more experts at the same inference cost. The model supports:

  • Context length: 256K tokens
  • Architecture: Hybrid Transformer-Mamba with latent MoE
  • Active parameters: 12B (4x cost efficiency for inference)
  • Input/output: Text only
  • Supported languages: English, French, German, Italian, Japanese, Spanish, Chinese
  • Multi-token prediction: Enabled for faster long reasoning sequences

Performance claims

NVIDIA claims the model achieves:

  • 5x higher throughput efficiency over previous Nemotron Super
  • 2x higher accuracy on reasoning and agentic tasks compared to the prior version
  • Leading performance on AIME 2025, Terminal-Bench, SWE-Bench Verified, RULER, and multilingual benchmarks
  • Token budget support for improved accuracy with minimal reasoning token generation

The model was trained using multi-environment reinforcement learning across 10+ environments via NVIDIA NeMo, according to the company.

Key capabilities

The model is positioned for use cases including:

  • Software development: Code generation and summarization
  • Finance: Loan processing, data extraction, fraud detection
  • Cybersecurity: Issue triage, malware analysis, threat hunting
  • Search: User intent understanding and agent activation
  • Retail: Inventory optimization and personalized recommendations
  • Multi-agent workflows: Orchestrating task-specific agents for complex business processes

Access and pricing

The model is available through Amazon Bedrock's Chat playground and programmatically via the model ID nvidia.nemotron-super-3-120b. It supports:

  • Amazon Bedrock console interface
  • InvokeModel and Converse APIs
  • AWS CLI and SDKs
  • OpenAI SDK compatibility

Specific pricing per 1M tokens for input and output has not been disclosed by AWS.

Technical details

Nemotron 3 Super uses latent MoE, where experts operate on shared latent representations before outputs project back to token space. This approach enables better specialization around semantic structures and multi-hop reasoning patterns. Multi-token prediction (MTP) allows the model to predict multiple future tokens in a single forward pass, reducing latency for chain-of-thought, planning, and code generation.

The model is released with open weights, datasets, and training recipes, enabling developers to customize and deploy locally for enhanced privacy and security.

What this means

Bedrock now offers a high-efficiency reasoning model positioned against proprietary alternatives for agentic workflows. The 12B active parameters claim suggests competitive inference costs while the 256K context enables longer reasoning chains. The open-weights approach differentiates from closed models, though actual latency, throughput, and cost metrics versus competing models on Bedrock (Claude, Llama) remain unverified. Organizations building multi-agent systems should benchmark this against existing options, particularly for reasoning-heavy tasks where the 2x accuracy improvement claim applies.

Related Articles

product update

OpenAI adds Trusted Contact feature to alert emergency contacts when ChatGPT detects self-harm discussions

OpenAI launched an optional Trusted Contact feature for ChatGPT that notifies designated emergency contacts when the system detects discussions about self-harm or suicide. The feature requires manual review by trained personnel before sending notifications, and does not share chat transcripts with contacts.

product update

Anthropic adds dreaming, outcomes, and multiagent orchestration to Claude Managed Agents

Anthropic has released three new capabilities for Claude Managed Agents: dreaming (research preview) for pattern recognition and self-improvement, outcomes for defining success criteria with automated evaluation, and multiagent orchestration for delegating tasks to specialist agents.

product update

AWS launches Amazon Bedrock AgentCore Payments with Coinbase and Stripe for autonomous agent transactions

AWS announced Amazon Bedrock AgentCore Payments (preview), enabling AI agents to autonomously discover and pay for APIs, web content, MCP servers, and other agents. Built with Coinbase and Stripe, the service supports micropayments through the x402 protocol with per-session spending limits and full transaction observability.

product update

Google testing 'Gemini Agent' upgrade that takes actions across apps, makes purchases autonomously

Google is testing a major upgrade to Gemini Agent, internally called "Remy," that can autonomously take actions on users' behalf including making purchases, sharing documents, and communicating with others. The experimental feature, available to Google AI Ultra subscribers, will monitor user preferences and handle complex tasks proactively across connected apps.

Comments

Loading...