model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

TL;DR

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

3 min read
0

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today at I/O 2026, claiming the model delivers 4x faster output tokens per second compared to other frontier models while maintaining Flash series pricing. According to Google, the model surpasses Gemini 3.1 Pro in coding, agentic, and multimodal benchmarks.

The model is available immediately in the Gemini app, Google Search, Antigravity 2.0, and via the Gemini API. Gemini 3.5 Pro is currently in testing and will launch next month.

New Gemini Omni multimodal series

Google introduced Gemini Omni, a new model family combining reasoning with content creation. Gemini Omni Flash accepts image, audio, video, and text inputs and outputs video "grounded in real-world knowledge" that can be edited. The model is rolling out to AI Plus, Pro, and Ultra subscribers in the Gemini app, Google Flow, and YouTube Shorts.

The company also launched Google Flow and Flow Music as standalone mobile apps, with Flow available on Android in beta and Flow Music launching on iOS first.

Gemini Spark agentic assistant

Google announced Gemini Spark, described as "your personal agent" that performs tasks autonomously. The system integrates with Gmail, Docs, and other Google Workspace apps, with third-party tool support via MCP (Model Context Protocol) coming this summer.

Gemini Spark will be available next week exclusively to Google AI Ultra subscribers in the US. The service transforms Gemini "from an assistant that can answer your questions into an active partner that does real work on your behalf," according to Google.

Pricing changes for Google AI plans

Google restructured its subscription tiers. AI Ultra now starts at $100 per month (previously $250), offering 5x higher usage limits than AI Pro. A new $200 tier replaces the previous $250 plan with identical capabilities.

The company is shifting from daily prompt limits to a "compute-used" model that accounts for prompt complexity, features used, and chat length. Limits refresh every five hours until reaching a weekly cap.

Search and productivity updates

AI Mode in Google Search now runs on Gemini 3.5 Flash. New "information agents" will monitor the web 24/7 for topics users specify, available to AI Pro and Ultra subscribers this summer. Google Search will also gain the ability to build custom dashboards and trackers for ongoing tasks.

Gmail Live, a conversational email search feature, rolls out to AI Pro and Ultra subscribers in the US this summer on Android and iOS. Docs Live for conversational document creation and editing launches simultaneously for the same subscriber tiers.

What this means

Google's 4x speed claim for Gemini 3.5 Flash positions it directly against Anthropic's Claude 3.5 Sonnet and OpenAI's GPT-4o in the fast inference tier. The introduction of Gemini Omni's video output capabilities represents a significant multimodal expansion, though real-world quality benchmarks remain to be seen. The restructured pricing and compute-based limits suggest Google is attempting to balance access with infrastructure costs as model capabilities increase. The MCP integration for Gemini Spark indicates Google is adopting industry standards for agent interoperability rather than building a proprietary ecosystem.

Related Articles

model release

Google launches Nano Banana 2 Lite image model at 4 seconds per image, $0.04 per 1,000 generations

Google released Nano Banana 2 Lite, an image generation model that produces images in four seconds at under four cents per thousand images. The model prioritizes speed and cost over quality, targeting developers building high-volume image pipelines.

model release

Google releases Gemini 3.1 Flash Lite Image, its fastest and cheapest image generation model

Google has released Gemini 3.1 Flash Lite Image, also called Nano Banana 2 Lite, which the company describes as its fastest and cheapest image generation model. The model is available through Google's AI Studio and Gemini API with the identifier gemini-3.1-flash-lite-image.

model release

Claude Sonnet 5 ships with 1M token context and new tokenizer that increases costs 30-40% for English text

Anthropic released Claude Sonnet 5 with a 1 million token context window and 128,000 token maximum output. The model removes traditional sampling parameters and introduces a new tokenizer that generates approximately 30% more tokens than Sonnet 4.6 for the same English text—effectively a significant price increase despite unchanged nominal rates of $3/million input and $15/million output tokens.

model release

Google launches Gemini 3.1 Flash Lite Image with 4-second generation time, $0.25 per 1M input tokens

Google has released Gemini 3.1 Flash Lite Image, a text-to-image model that generates 1K resolution images in approximately 4 seconds — 2.7× faster than Gemini 3.1 Flash Image. The model is priced at $0.25 per 1M input tokens and $1.50 per 1M output tokens, with a 66K context window and knowledge cutoff of January 2025.

Comments

Loading...