NVIDIA Nemotron 3 Super now available on Amazon Bedrock with 256K context window
NVIDIA Nemotron 3 Super, a hybrid Mixture of Experts model with 120B parameters and 12B active parameters, is now available as a fully managed model on Amazon Bedrock. The model supports up to 256K token context length and claims 5x higher throughput efficiency over the previous Nemotron Super and 2x higher accuracy on reasoning tasks.
NVIDIA Nemotron 3 Super — Quick Specs
NVIDIA Nemotron 3 Super launches on Amazon Bedrock
NVIDIA Nemotron 3 Super is now available as a fully managed, serverless model on Amazon Bedrock, joining the existing Nemotron Nano offerings. The model uses a hybrid Mixture of Experts (MoE) architecture optimized for agentic AI systems and multi-agent workflows without requiring infrastructure management.
Model specifications
Nemotron 3 Super is a 120B parameter model with 12B active parameters per token, using a latent MoE design that enables 4x more experts at the same inference cost. The model supports:
- Context length: 256K tokens
- Architecture: Hybrid Transformer-Mamba with latent MoE
- Active parameters: 12B (4x cost efficiency for inference)
- Input/output: Text only
- Supported languages: English, French, German, Italian, Japanese, Spanish, Chinese
- Multi-token prediction: Enabled for faster long reasoning sequences
Performance claims
NVIDIA claims the model achieves:
- 5x higher throughput efficiency over previous Nemotron Super
- 2x higher accuracy on reasoning and agentic tasks compared to the prior version
- Leading performance on AIME 2025, Terminal-Bench, SWE-Bench Verified, RULER, and multilingual benchmarks
- Token budget support for improved accuracy with minimal reasoning token generation
The model was trained using multi-environment reinforcement learning across 10+ environments via NVIDIA NeMo, according to the company.
Key capabilities
The model is positioned for use cases including:
- Software development: Code generation and summarization
- Finance: Loan processing, data extraction, fraud detection
- Cybersecurity: Issue triage, malware analysis, threat hunting
- Search: User intent understanding and agent activation
- Retail: Inventory optimization and personalized recommendations
- Multi-agent workflows: Orchestrating task-specific agents for complex business processes
Access and pricing
The model is available through Amazon Bedrock's Chat playground and programmatically via the model ID nvidia.nemotron-super-3-120b. It supports:
- Amazon Bedrock console interface
- InvokeModel and Converse APIs
- AWS CLI and SDKs
- OpenAI SDK compatibility
Specific pricing per 1M tokens for input and output has not been disclosed by AWS.
Technical details
Nemotron 3 Super uses latent MoE, where experts operate on shared latent representations before outputs project back to token space. This approach enables better specialization around semantic structures and multi-hop reasoning patterns. Multi-token prediction (MTP) allows the model to predict multiple future tokens in a single forward pass, reducing latency for chain-of-thought, planning, and code generation.
The model is released with open weights, datasets, and training recipes, enabling developers to customize and deploy locally for enhanced privacy and security.
What this means
Bedrock now offers a high-efficiency reasoning model positioned against proprietary alternatives for agentic workflows. The 12B active parameters claim suggests competitive inference costs while the 256K context enables longer reasoning chains. The open-weights approach differentiates from closed models, though actual latency, throughput, and cost metrics versus competing models on Bedrock (Claude, Llama) remain unverified. Organizations building multi-agent systems should benchmark this against existing options, particularly for reasoning-heavy tasks where the 2x accuracy improvement claim applies.
Related Articles
OpenAI adds Trusted Contact feature to alert emergency contacts when ChatGPT detects self-harm discussions
OpenAI launched an optional Trusted Contact feature for ChatGPT that notifies designated emergency contacts when the system detects discussions about self-harm or suicide. The feature requires manual review by trained personnel before sending notifications, and does not share chat transcripts with contacts.
Anthropic adds dreaming, outcomes, and multiagent orchestration to Claude Managed Agents
Anthropic has released three new capabilities for Claude Managed Agents: dreaming (research preview) for pattern recognition and self-improvement, outcomes for defining success criteria with automated evaluation, and multiagent orchestration for delegating tasks to specialist agents.
AWS launches Amazon Bedrock AgentCore Payments with Coinbase and Stripe for autonomous agent transactions
AWS announced Amazon Bedrock AgentCore Payments (preview), enabling AI agents to autonomously discover and pay for APIs, web content, MCP servers, and other agents. Built with Coinbase and Stripe, the service supports micropayments through the x402 protocol with per-session spending limits and full transaction observability.
Google testing 'Gemini Agent' upgrade that takes actions across apps, makes purchases autonomously
Google is testing a major upgrade to Gemini Agent, internally called "Remy," that can autonomously take actions on users' behalf including making purchases, sharing documents, and communicating with others. The experimental feature, available to Google AI Ultra subscribers, will monitor user preferences and handle complex tasks proactively across connected apps.
Comments
Loading...