model release

Google Releases Gemini 3.5 Flash with 1M Token Context and Configurable Thinking Modes at $1.50/$9 Per Million Tokens

TL;DR

Google has released Gemini 3.5 Flash, a multimodal model with a 1 million token context window priced at $1.50 per million input tokens and $9 per million output tokens. The model supports text, image, video, audio, and PDF inputs with configurable thinking effort levels from minimal to high.

2 min read
0

Google Releases Gemini 3.5 Flash with 1M Context and Thinking Modes

Google has released Gemini 3.5 Flash, a multimodal model priced at $1.50 per million input tokens and $9 per million output tokens. The model features a 1 million token context window and supports text, image, video, audio, and PDF inputs.

Key Specifications

Gemini 3.5 Flash is positioned as a high-efficiency model delivering what Google describes as "near-Pro level coding and reasoning at Flash-tier cost and speed." The model defaults to medium thinking effort for standard responses but supports four configurable thinking levels: minimal, low, medium, and high.

The thinking mode configuration allows developers to make explicit cost-performance trade-offs based on task complexity. This feature is designed for parallel agentic execution loops where different subtasks may require different computational resources.

Technical Capabilities

According to Google, the model is "highly optimized for coding proficiency" and multimodal processing. The 1 million token context window positions it for handling large codebases, extensive documentation, and long-form content analysis.

The multimodal capabilities extend across five input types: text, static images, video, audio, and PDF documents. This broad input support makes the model applicable to document processing, multimedia analysis, and complex reasoning tasks that span multiple data formats.

Pricing and Availability

At $1.50 per million input tokens and $9 per million output tokens, Gemini 3.5 Flash is priced competitively in the Flash model tier. The model is available through OpenRouter with routing to multiple providers for reliability and uptime optimization.

The release date is listed as May 19, 2026 in the source material, though this appears to be a future date and may represent a placeholder or projected availability timeline.

What This Means

Gemini 3.5 Flash's configurable thinking modes represent a shift toward explicit computational trade-offs in model inference. Rather than offering a single performance point, developers can adjust reasoning depth based on task requirements—a feature particularly relevant for agentic workflows where some operations need deep reasoning while others prioritize speed.

The 1M context window combined with multimodal support and competitive pricing positions this model for code analysis, document processing, and complex multi-step reasoning tasks. The thinking mode feature may influence how other providers structure their model offerings, particularly for use cases requiring variable computational intensity across different parts of a workflow.

Related Articles

model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

model release

Google launches Gemini 3.5 Flash and new Omni multimodal AI family at I/O 2026

Google launched Gemini 3.5 Flash today as the default model for its Gemini app and AI Mode in Search, with Gemini 3.5 Pro following next month. The company also introduced Gemini Omni, a new multimodal AI family capable of generating video from text, photos, video, and audio inputs.

model release

Google launches Gemini Omni Flash, multimodal video generation model available to AI Plus subscribers

Google has released Gemini Omni Flash, the first model in its new Gemini Omni family designed to generate video content from text, images, video, and audio inputs. The model is available now to AI Plus subscribers, with free access coming to YouTube Shorts and YouTube Create later this week.

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

Comments

Loading...