Augment Code launches Prism router: 20-30% cost reduction routing between Claude Opus 4.7, GPT 5.5, and cheaper models
Augment Code released Prism, a model routing system that selects between frontier models and cheaper alternatives per user turn. On internal benchmarks, Prism matches Claude Opus 4.7 and GPT 5.5 quality while reducing per-task costs by 20-30%, translating to approximately $20,000 monthly savings for teams sending 10,000 requests.
Augment Code launches Prism router: 20-30% cost reduction routing between Claude Opus 4.7, GPT 5.5, and cheaper models
Augment Code released Prism, a model routing system that dynamically selects between frontier reasoning models and cheaper alternatives on a per-turn basis. According to the company, Prism reduces per-task costs by 20-30% compared to frontier models while maintaining equivalent quality on their internal multi-turn coding benchmark.
Routing configurations and cost savings
Prism offers two configurations:
- Prism (Claude + Gemini): Routes between Claude Opus 4.7, Claude Sonnet 4.6, and Gemini Flash 3.0. Cost per task: $4.91 vs. $6.81 for Opus 4.7 alone (28% reduction)
- Prism (GPT + Kimi): Routes between GPT 5.5, GPT 5.4, and Kimi K2.6. Cost per task: $5.25 vs. $7.31 for GPT 5.5 alone (28% reduction)
For a team sending 10,000 user requests monthly, Augment claims this translates to approximately $20,000 in savings.
Internal benchmark performance
On Augment's proprietary multi-turn Go repository benchmark:
- Prism (GPT + Kimi): +0.30 average score (80% CI: ±0.14)
- GPT 5.5: +0.21 (±0.05)
- Prism (Claude + Gemini): +0.11 (±0.08)
- Opus 4.7: +0.08 (±0.10)
The benchmark converts historical pull requests into multi-message developer conversations spanning multiple difficulty levels. An LLM judge scores agent-generated diffs against original PRs on correctness, completeness, and code quality.
External benchmark results
On Terminal Bench 2.0:
- Prism (GPT + Kimi): 75.7% pass rate, $0.68 per task (17% cheaper than GPT 5.5)
- GPT 5.5: 76.0% pass rate, $0.82 per task
- Prism (Claude + Gemini): 64.0% pass rate, $0.89 per task (tied with Opus 4.7 on quality, 5% more expensive due to token overhead)
On SWE-Bench Pro (731 instances):
- Opus 4.7: 61.8% pass rate, $1.98 per instance
- Prism (Claude + Gemini): 59.5% pass rate, $1.85 per instance (6% cheaper)
- GPT 5.5: 53.6% pass rate, $2.15 per instance
- Prism (GPT + Kimi): 52.9% pass rate, $1.88 per instance (12% cheaper than GPT 5.5)
Cache-aware routing mechanism
Prism uses a small planner model to select the underlying model before each user turn. The system avoids frequent model switches to preserve prompt cache efficiency—switching models evicts the cache and increases costs by approximately 10×.
According to Augment's analysis of IDE agent traffic, the top 10% of user turns consumed 57% of all LLM rounds, while most turns required lighter processing but were billed at frontier rates because users defaulted to the strongest model.
What this means
Prism addresses a real cost problem in production AI coding agents: teams overpay by routing all requests through frontier models even when cheaper alternatives suffice. The 20-30% cost reduction is meaningful at scale, though the quality-cost tradeoff depends heavily on benchmark selection. Single-task benchmarks like SWE-Bench Pro show minimal routing advantages since most problems require frontier capabilities. The internal multi-turn benchmark showing larger gains may reflect more realistic workload diversity, but generalization beyond Go codebases remains unverified. Cache-aware routing is the technical innovation here—naive per-request routing would lose savings to cache evictions.
Related Articles
Google deploys Gemini AI to millions of existing cars, replacing Google Assistant
Google announced it will deploy Gemini AI to vehicles with Google built-in, replacing the current Google Assistant. General Motors confirmed 4 million vehicles from model year 2022 and newer across Cadillac, Chevrolet, Buick, and GMC brands will receive the update, with the rollout beginning in the U.S. with English-language support.
Google Gemini app adds notebook organization feature to Android and iOS
Google has rolled out notebooks to the Gemini mobile app on Android and iOS, allowing users to organize conversations and files in dedicated spaces. The feature syncs with NotebookLM and supports between 50 and 600 sources per notebook depending on subscription tier.
Google Gemini adds direct file generation for Word, Excel, LaTeX, and 8 other formats
Google is rolling out direct file generation to all Gemini users worldwide. The chatbot can now export outputs in 11 formats including Microsoft Word, Excel, PDF, LaTeX, and Google Workspace formats directly from the prompt bar.
Google TV adds Gemini-powered image generation with Imagen 3 and Veo video tools
Google is bringing Gemini AI capabilities to Google TV, starting with Imagen 3 image generation and Veo video creation tools. The features launch first on Gemini-enabled TCL TVs in the U.S., with Google Photos getting AI-powered search and editing features.
Comments
Loading...