product update

Augment Code launches Prism router: 20-30% cost reduction routing between Claude Opus 4.7, GPT 5.5, and cheaper models

TL;DR

Augment Code released Prism, a model routing system that selects between frontier models and cheaper alternatives per user turn. On internal benchmarks, Prism matches Claude Opus 4.7 and GPT 5.5 quality while reducing per-task costs by 20-30%, translating to approximately $20,000 monthly savings for teams sending 10,000 requests.

May 2, 2026 · 6:50 PM2 min read

Augment Code launches Prism router: 20-30% cost reduction routing between Claude Opus 4.7, GPT 5.5, and cheaper models

Augment Code released Prism, a model routing system that dynamically selects between frontier reasoning models and cheaper alternatives on a per-turn basis. According to the company, Prism reduces per-task costs by 20-30% compared to frontier models while maintaining equivalent quality on their internal multi-turn coding benchmark.

Routing configurations and cost savings

Prism offers two configurations:

Prism (Claude + Gemini): Routes between Claude Opus 4.7, Claude Sonnet 4.6, and Gemini Flash 3.0. Cost per task: $4.91 vs. $6.81 for Opus 4.7 alone (28% reduction)
Prism (GPT + Kimi): Routes between GPT 5.5, GPT 5.4, and Kimi K2.6. Cost per task: $5.25 vs. $7.31 for GPT 5.5 alone (28% reduction)

For a team sending 10,000 user requests monthly, Augment claims this translates to approximately $20,000 in savings.

Internal benchmark performance

On Augment's proprietary multi-turn Go repository benchmark:

Prism (GPT + Kimi): +0.30 average score (80% CI: ±0.14)
GPT 5.5: +0.21 (±0.05)
Prism (Claude + Gemini): +0.11 (±0.08)
Opus 4.7: +0.08 (±0.10)

The benchmark converts historical pull requests into multi-message developer conversations spanning multiple difficulty levels. An LLM judge scores agent-generated diffs against original PRs on correctness, completeness, and code quality.

External benchmark results

On Terminal Bench 2.0:

Prism (GPT + Kimi): 75.7% pass rate, $0.68 per task (17% cheaper than GPT 5.5)
GPT 5.5: 76.0% pass rate, $0.82 per task
Prism (Claude + Gemini): 64.0% pass rate, $0.89 per task (tied with Opus 4.7 on quality, 5% more expensive due to token overhead)

On SWE-Bench Pro (731 instances):

Opus 4.7: 61.8% pass rate, $1.98 per instance
Prism (Claude + Gemini): 59.5% pass rate, $1.85 per instance (6% cheaper)
GPT 5.5: 53.6% pass rate, $2.15 per instance
Prism (GPT + Kimi): 52.9% pass rate, $1.88 per instance (12% cheaper than GPT 5.5)

Cache-aware routing mechanism

Prism uses a small planner model to select the underlying model before each user turn. The system avoids frequent model switches to preserve prompt cache efficiency—switching models evicts the cache and increases costs by approximately 10×.

According to Augment's analysis of IDE agent traffic, the top 10% of user turns consumed 57% of all LLM rounds, while most turns required lighter processing but were billed at frontier rates because users defaulted to the strongest model.

What this means

Prism addresses a real cost problem in production AI coding agents: teams overpay by routing all requests through frontier models even when cheaper alternatives suffice. The 20-30% cost reduction is meaningful at scale, though the quality-cost tradeoff depends heavily on benchmark selection. Single-task benchmarks like SWE-Bench Pro show minimal routing advantages since most problems require frontier capabilities. The internal multi-turn benchmark showing larger gains may reflect more realistic workload diversity, but generalization beyond Go codebases remains unverified. Cache-aware routing is the technical innovation here—naive per-request routing would lose savings to cache evictions.

Source: augmentcode.com ↗

model-routing augment-code cost-optimization claude gpt gemini kimi ai-coding

product updateJuly 29, 2026

Replit Launches 'Replit Design,' an AI Design Suite Powered by Claude, GPT-5, Gemini, Kimi, and GLM

Replit has launched Replit Design, a browser-based AI design suite that lets users generate apps, sites, and brand assets using models including Claude, GPT-5, Gemini, Kimi, and GLM. The product replaces Replit's earlier Canvas tool and integrates the Mobbin UI reference library directly into the workflow.

product updateJuly 31, 2026

Google Cancels Standalone AI Studio Mobile App, Shifts App-Building Into Gemini App Instead

Google has canceled the standalone AI Studio app for Android and iOS that it teased at I/O 2026, despite 800,000 pre-orders. Instead, app-building capabilities will be integrated directly into the Gemini app for mobile and desktop.

product updateJuly 31, 2026

Oracle Adds Google's Gemini to Fusion Apps and NetSuite; Shares Jump 8.4%

Oracle is embedding Google's Gemini 3.1 Flash-Lite and Gemini 3.5 Flash models into its Fusion Applications and NetSuite software, expanding a partnership with its cloud rival. Oracle shares rose as much as 8.4% to $127.64 on the news.

product updateJuly 30, 2026

Google's Gemini Spark Gains Chrome Auto-Browse Control, Expands to 160+ Countries

Google's Gemini Spark personal agent can now control desktop Chrome directly, using logged-in accounts and saved passwords to complete web tasks. The feature launches in the US first, alongside a Google AI Pro expansion bringing Spark to more than 160 additional countries.

Augment Code launches Prism router: 20-30% cost reduction routing between Claude Opus 4.7, GPT 5.5, and cheaper models

Augment Code launches Prism router: 20-30% cost reduction routing between Claude Opus 4.7, GPT 5.5, and cheaper models

Routing configurations and cost savings

Internal benchmark performance

External benchmark results

Cache-aware routing mechanism

What this means

Related Articles

Replit Launches 'Replit Design,' an AI Design Suite Powered by Claude, GPT-5, Gemini, Kimi, and GLM

Google Cancels Standalone AI Studio Mobile App, Shifts App-Building Into Gemini App Instead

Oracle Adds Google's Gemini to Fusion Apps and NetSuite; Shares Jump 8.4%

Google's Gemini Spark Gains Chrome Auto-Browse Control, Expands to 160+ Countries

Comments