Google caps single-prompt quota for Gemini 3.1 Pro, makes Flash-Lite free after usage limit complaints
Google has modified Gemini's compute-based usage limits introduced at I/O 2026 after users reported depleting quotas too quickly. The company is now capping how much quota a single Gemini 3.1 Pro prompt can consume and making all 3.1 Flash-Lite prompts free.
Google caps single-prompt quota for Gemini 3.1 Pro, makes Flash-Lite free after usage limit complaints
Google has adjusted Gemini's new compute-based usage limits one week after their introduction at I/O 2026, responding to user complaints about hitting quotas too quickly.
Key changes
The company is now capping the amount of quota a single prompt can use when accessing Gemini 3.1 Pro, according to Gemini lead Josh Woodward. This addresses issues where complex prompts with large files were rapidly depleting user limits.
Google has made all Gemini 3.1 Flash-Lite prompts free, with these requests no longer counting against user quotas.
The company also doubled the number of Omni generations available to Google AI Ultra subscribers after fixing a bug where "just one or two Omni videos" would drain quotas for some users.
How the compute-based system works
Google introduced compute-based usage limits at I/O 2026 to replace previous approaches. The system accounts for prompt complexity, tools used, and chat length, with a 5-hour refresh period until weekly limits are met. According to Google, "a simple text prompt uses far less compute than a complex video or coding prompt."
Failed requests now do not count against limits. Google stated: "If a request fails, you won't be charged. Our system mistakes are on us, not you. Your quota is used only for successful completions."
Additional improvements
Google plans to provide more detailed usage breakdowns and notifications for compute-intensive tasks like Deep Research. The current gemini.google.com/usage dashboard provides only a high-level overview.
The company will introduce pay-as-you-go top-up AI credits for users who need additional quota.
Google also confirmed that when users select a specific model, the system remembers that choice across future sessions unless manually changed or a quota cap triggers automatic fallback to a lighter model.
What this means
The rapid adjustments reveal that Google's initial compute-based limit implementation was too restrictive for real-world usage patterns. By capping per-prompt consumption and making the lighter Flash-Lite tier free, Google is attempting to balance resource management with user experience. The move to exempt Flash-Lite from quotas suggests Google wants to retain casual users while reserving compute limits primarily for its more powerful Pro models. The bug affecting Omni video generation indicates the new system launched with significant technical issues that required immediate correction.
Related Articles
Google switches Gemini to compute-based limits, cuts AI Ultra to $100/month
Google is replacing Gemini's daily prompt limits with a compute-based system that factors in prompt complexity, features used, and chat length. Limits refresh every five hours until reaching a weekly cap. AI Ultra, aimed at developers and technical leads, now starts at $100/month—down from its previous entry point—with 5x higher usage limits than the Pro plan.
Google preparing 'AI Ultra Lite' tier between $20 Pro and $250 Ultra plans, adding usage dashboard
Google is developing an intermediate subscription tier called 'AI Ultra Lite' to slot between its $20 Pro and $250 Ultra plans, according to code discovered in the Gemini macOS app. The company is also preparing a usage dashboard showing token budgets across five-hour and weekly limits.
Vercel AI SDK Adds Support for Google's Deep Research Models and Gemini Embedding-2
Vercel released version 5.0.0-canary.98 of its Google Vertex AI SDK, adding support for three new Google models: deep-research-max-preview-04-2026, deep-research-preview-04-2026, and gemini-embedding-2. The update enables developers to integrate Google's research-focused models and latest embedding model into applications using Vercel's AI SDK.
Vercel AI SDK Adds Support for Gemini Embedding 2 and Deep Research Models
Vercel released version 4.0.0-canary.75 of its AI SDK Google package on May 30, adding support for three new Google models: gemini-embedding-2, deep-research-max-preview-04-2026, and deep-research-preview-04-2026. The update enables developers to integrate Google's latest embedding and deep research capabilities into applications built with the Vercel AI SDK.
Comments
Loading...