product updateNVIDIA

NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window

TL;DR

NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.

2 min read
0

NVIDIA Nemotron 3 Super launches on Amazon Bedrock

NVIDIA Nemotron 3 Super is now available as a fully managed, serverless model on Amazon Bedrock, joining the existing Nemotron Nano offerings. The model uses a hybrid Mixture of Experts (MoE) architecture optimized for agentic AI systems and multi-agent workflows without requiring infrastructure management.

Model specifications

Nemotron 3 Super is a 120B parameter model with 12B active parameters per token, using a latent MoE design that enables 4x more experts at the same inference cost. The model supports:

  • Context length: 256K tokens
  • Architecture: Hybrid Transformer-Mamba with latent MoE
  • Active parameters: 12B (4x cost efficiency for inference)
  • Input/output: Text only
  • Supported languages: English, French, German, Italian, Japanese, Spanish, Chinese
  • Multi-token prediction: Enabled for faster long reasoning sequences

Performance claims

NVIDIA claims the model achieves:

  • 5x higher throughput efficiency over previous Nemotron Super
  • 2x higher accuracy on reasoning and agentic tasks compared to the prior version
  • Leading performance on AIME 2025, Terminal-Bench, SWE-Bench Verified, RULER, and multilingual benchmarks
  • Token budget support for improved accuracy with minimal reasoning token generation

The model was trained using multi-environment reinforcement learning across 10+ environments via NVIDIA NeMo, according to the company.

Key capabilities

The model is positioned for use cases including:

  • Software development: Code generation and summarization
  • Finance: Loan processing, data extraction, fraud detection
  • Cybersecurity: Issue triage, malware analysis, threat hunting
  • Search: User intent understanding and agent activation
  • Retail: Inventory optimization and personalized recommendations
  • Multi-agent workflows: Orchestrating task-specific agents for complex business processes

Access and pricing

The model is available through Amazon Bedrock's Chat playground and programmatically via the model ID nvidia.nemotron-super-3-120b. It supports:

  • Amazon Bedrock console interface
  • InvokeModel and Converse APIs
  • AWS CLI and SDKs
  • OpenAI SDK compatibility

Specific pricing per 1M tokens for input and output has not been disclosed by AWS.

Technical details

Nemotron 3 Super uses latent MoE, where experts operate on shared latent representations before outputs project back to token space. This approach enables better specialization around semantic structures and multi-hop reasoning patterns. Multi-token prediction (MTP) allows the model to predict multiple future tokens in a single forward pass, reducing latency for chain-of-thought, planning, and code generation.

The model is released with open weights, datasets, and training recipes, enabling developers to customize and deploy locally for enhanced privacy and security.

What this means

Bedrock now offers a high-efficiency reasoning model positioned against proprietary alternatives for agentic workflows. The 12B active parameters claim suggests competitive inference costs while the 256K context enables longer reasoning chains. The open-weights approach differentiates from closed models, though actual latency, throughput, and cost metrics versus competing models on Bedrock (Claude, Llama) remain unverified. Organizations building multi-agent systems should benchmark this against existing options, particularly for reasoning-heavy tasks where the 2x accuracy improvement claim applies.

Related Articles

product update

Mistral AI Launches Forge for Enterprise Model Training on Proprietary Data

Mistral AI has launched Forge, a platform that allows enterprises to train custom AI models on their proprietary data including codebases, compliance policies, and operational documentation. The system supports both dense and mixture-of-experts architectures with pre-training, post-training, and reinforcement learning capabilities.

product update

Google expands Gemini Android overlay menu with six new tools accessible without opening app

Google has expanded the Gemini overlay plus menu on Android to include six tools: Videos, Music, Canvas, and Guided Learning join the existing Images and Personal Intelligence options. The update, rolling out in Google app version 17.32, allows users to access most Gemini features from anywhere on Android without opening the full app.

product update

Trail of Bits and OpenAI's Daybreak initiative produce 64 pull requests across 19 open-source projects in one week using

Trail of Bits launched Patch the Planet, a security initiative using OpenAI's GPT-5.5-Cyber model to find and fix bugs in critical open-source projects. The first week produced 64 pull requests and 51 issues across 19 projects including cURL, Python, PyPI, and Sigstore, with 37 patches already merged.

product update

Tencent tests AI assistant Xiaowei in WeChat's 1.4 billion user base

Tencent is testing an AI assistant called Xiaowei in Weixin, the Chinese version of WeChat, which has over 1.4 billion monthly active users combined with WeChat. Users can interact with Xiaowei through text or voice, communicate with friends, and launch mini-programs within the app.

Comments

Loading...