product update

Augment Code launches Prism router: 20-30% cost reduction routing between Claude Opus 4.7, GPT 5.5, and cheaper models

TL;DR

Augment Code released Prism, a model routing system that selects between frontier models and cheaper alternatives per user turn. On internal benchmarks, Prism matches Claude Opus 4.7 and GPT 5.5 quality while reducing per-task costs by 20-30%, translating to approximately $20,000 monthly savings for teams sending 10,000 requests.

2 min read
0

Augment Code launches Prism router: 20-30% cost reduction routing between Claude Opus 4.7, GPT 5.5, and cheaper models

Augment Code released Prism, a model routing system that dynamically selects between frontier reasoning models and cheaper alternatives on a per-turn basis. According to the company, Prism reduces per-task costs by 20-30% compared to frontier models while maintaining equivalent quality on their internal multi-turn coding benchmark.

Routing configurations and cost savings

Prism offers two configurations:

  • Prism (Claude + Gemini): Routes between Claude Opus 4.7, Claude Sonnet 4.6, and Gemini Flash 3.0. Cost per task: $4.91 vs. $6.81 for Opus 4.7 alone (28% reduction)
  • Prism (GPT + Kimi): Routes between GPT 5.5, GPT 5.4, and Kimi K2.6. Cost per task: $5.25 vs. $7.31 for GPT 5.5 alone (28% reduction)

For a team sending 10,000 user requests monthly, Augment claims this translates to approximately $20,000 in savings.

Internal benchmark performance

On Augment's proprietary multi-turn Go repository benchmark:

  • Prism (GPT + Kimi): +0.30 average score (80% CI: ±0.14)
  • GPT 5.5: +0.21 (±0.05)
  • Prism (Claude + Gemini): +0.11 (±0.08)
  • Opus 4.7: +0.08 (±0.10)

The benchmark converts historical pull requests into multi-message developer conversations spanning multiple difficulty levels. An LLM judge scores agent-generated diffs against original PRs on correctness, completeness, and code quality.

External benchmark results

On Terminal Bench 2.0:

  • Prism (GPT + Kimi): 75.7% pass rate, $0.68 per task (17% cheaper than GPT 5.5)
  • GPT 5.5: 76.0% pass rate, $0.82 per task
  • Prism (Claude + Gemini): 64.0% pass rate, $0.89 per task (tied with Opus 4.7 on quality, 5% more expensive due to token overhead)

On SWE-Bench Pro (731 instances):

  • Opus 4.7: 61.8% pass rate, $1.98 per instance
  • Prism (Claude + Gemini): 59.5% pass rate, $1.85 per instance (6% cheaper)
  • GPT 5.5: 53.6% pass rate, $2.15 per instance
  • Prism (GPT + Kimi): 52.9% pass rate, $1.88 per instance (12% cheaper than GPT 5.5)

Cache-aware routing mechanism

Prism uses a small planner model to select the underlying model before each user turn. The system avoids frequent model switches to preserve prompt cache efficiency—switching models evicts the cache and increases costs by approximately 10×.

According to Augment's analysis of IDE agent traffic, the top 10% of user turns consumed 57% of all LLM rounds, while most turns required lighter processing but were billed at frontier rates because users defaulted to the strongest model.

What this means

Prism addresses a real cost problem in production AI coding agents: teams overpay by routing all requests through frontier models even when cheaper alternatives suffice. The 20-30% cost reduction is meaningful at scale, though the quality-cost tradeoff depends heavily on benchmark selection. Single-task benchmarks like SWE-Bench Pro show minimal routing advantages since most problems require frontier capabilities. The internal multi-turn benchmark showing larger gains may reflect more realistic workload diversity, but generalization beyond Go codebases remains unverified. Cache-aware routing is the technical innovation here—naive per-request routing would lose savings to cache evictions.

Related Articles

product update

Google Pixel Drop to add Screen Reactions, Gemini Omni music generation in upcoming Android 17 update

Google's upcoming Pixel Drop will bring Screen Reactions and Gemini Omni-powered features to Pixel phones, according to promotional videos discovered on Amazon. The update, overdue based on Google's typical quarterly schedule following March 2026's release, appears to coincide with Android 17's stable release.

product update

Google rolls out Search agents for AI Ultra subscribers at $99.99-$199.99/month

Google has started rolling out information agents, the first type of Search agents announced at I/O 2026, exclusively for AI Ultra subscribers. The feature monitors blogs, news, social media, and real-time data sources 24/7 to deliver synthesized updates when conditions match user-specified queries.

product update

Google: 3.5 million users tested Gemini for Home assistant, speaker launch next week

Google announced that 3.5 million users have tested its Gemini for Home assistant since the early access program launched in October 2025. The company delivered over 2,500 bug fixes and 50 major feature updates based on user feedback, and is teasing a Google Home Speaker launch for next week.

product update

Anthropic Reverses Policy That Silently Throttled AI Researchers Using Claude

Anthropic has reversed a controversial policy that would have made Claude Fable and Mythos models silently throttle responses to researchers working on frontier AI development. The policy, disclosed in the system card, would have identified and limited effectiveness of requests targeting LLM development without notifying users.

Comments

Loading...