changelog

Google caps single-prompt quota for Gemini 3.1 Pro, makes Flash-Lite free after usage limit complaints

TL;DR

Google has modified Gemini's compute-based usage limits introduced at I/O 2026 after users reported depleting quotas too quickly. The company is now capping how much quota a single Gemini 3.1 Pro prompt can consume and making all 3.1 Flash-Lite prompts free.

May 29, 2026 · 2:20 AM2 min read

Google caps single-prompt quota for Gemini 3.1 Pro, makes Flash-Lite free after usage limit complaints

Google has adjusted Gemini's new compute-based usage limits one week after their introduction at I/O 2026, responding to user complaints about hitting quotas too quickly.

Key changes

The company is now capping the amount of quota a single prompt can use when accessing Gemini 3.1 Pro, according to Gemini lead Josh Woodward. This addresses issues where complex prompts with large files were rapidly depleting user limits.

Google has made all Gemini 3.1 Flash-Lite prompts free, with these requests no longer counting against user quotas.

The company also doubled the number of Omni generations available to Google AI Ultra subscribers after fixing a bug where "just one or two Omni videos" would drain quotas for some users.

How the compute-based system works

Google introduced compute-based usage limits at I/O 2026 to replace previous approaches. The system accounts for prompt complexity, tools used, and chat length, with a 5-hour refresh period until weekly limits are met. According to Google, "a simple text prompt uses far less compute than a complex video or coding prompt."

Failed requests now do not count against limits. Google stated: "If a request fails, you won't be charged. Our system mistakes are on us, not you. Your quota is used only for successful completions."

Additional improvements

Google plans to provide more detailed usage breakdowns and notifications for compute-intensive tasks like Deep Research. The current gemini.google.com/usage dashboard provides only a high-level overview.

The company will introduce pay-as-you-go top-up AI credits for users who need additional quota.

Google also confirmed that when users select a specific model, the system remembers that choice across future sessions unless manually changed or a quota cap triggers automatic fallback to a lighter model.

What this means

The rapid adjustments reveal that Google's initial compute-based limit implementation was too restrictive for real-world usage patterns. By capping per-prompt consumption and making the lighter Flash-Lite tier free, Google is attempting to balance resource management with user experience. The move to exempt Flash-Lite from quotas suggests Google wants to retain casual users while reserving compute limits primarily for its more powerful Pro models. The bug affecting Omni video generation indicates the new system launched with significant technical issues that required immediate correction.

Source: 9to5google.com ↗

google-deepmind gemini usage-limits changelog compute-quota gemini-3-1-pro gemini-flash-lite

changelogJuly 9, 2026

Vercel AI SDK adds Grok-4.5 support and video reference inputs for xAI provider

Vercel released version 4.0.10 of its xAI SDK provider, adding support for the Grok-4.5 model and introducing video reference inputs for reference-to-video generation. The update extends the existing image reference capability to handle video inputs.

changelogJune 8, 2026

Google AI Plus drops to $4.99/month with 400GB storage, down from $7.99

Google reduced its AI Plus subscription from $7.99 to $4.99 per month and doubled storage from 200GB to 400GB. The plan includes 2x higher Gemini usage limits with a 128,000 token context window, along with features like daily briefs and video generation.

changelogJuly 13, 2026

Vercel AI SDK adds support for GPT-5.6 with reasoning and prompt cache controls

Vercel shipped version 2.0.111 of its AI SDK with support for GPT-5.6, including reasoning mode and prompt cache controls. The update adds the new model identifiers to both the OpenAI provider and gateway implementations.

changelogJuly 13, 2026

Anthropic launches rupee pricing for Claude in India at ₹2,000/month, its second-largest market

Anthropic has begun displaying rupee-denominated pricing for Claude subscriptions in India, its second-largest market after the US with 5.8% of global usage. Claude Pro is priced at ₹2,000 ($21) monthly when billed annually, compared to $17 in the US, with Indian prices including local taxes.

Google caps single-prompt quota for Gemini 3.1 Pro, makes Flash-Lite free after usage limit complaints

Google caps single-prompt quota for Gemini 3.1 Pro, makes Flash-Lite free after usage limit complaints

Key changes

How the compute-based system works

Additional improvements

What this means

Related Articles

Vercel AI SDK adds Grok-4.5 support and video reference inputs for xAI provider

Google AI Plus drops to $4.99/month with 400GB storage, down from $7.99

Vercel AI SDK adds support for GPT-5.6 with reasoning and prompt cache controls

Anthropic launches rupee pricing for Claude in India at ₹2,000/month, its second-largest market

Comments