model release

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

TL;DR

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

2 min read
0

Gemini 3.1 Flash Lite — Quick Specs

Context window1049K tokens
Input$0.25/1M tokens
Output$1.5/1M tokens

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google released Gemini 3.1 Flash Lite on May 7, 2026, a high-efficiency multimodal model priced at $0.25 per million input tokens and $1.50 per million output tokens. The model features a 1,048,576 token context window and is priced at half the cost of Gemini 3 Flash, according to Google.

Core specifications

  • Context window: 1,048,576 tokens
  • Input pricing: $0.25 per million tokens
  • Output pricing: $1.50 per million tokens
  • Modalities: Text, image, video, audio, and PDF inputs
  • Release date: May 7, 2026

Technical capabilities

Gemini 3.1 Flash Lite supports four distinct thinking levels: minimal, low, medium, and high. These levels allow developers to fine-tune the trade-off between API cost and model performance based on task complexity. Google designed the model specifically for low-latency, high-volume workloads.

The model is optimized for lightweight agentic workflows and simple data extraction tasks. According to Google, the model prioritizes responsiveness and API cost efficiency over maximum capability.

Thinking mode integration

The model supports reasoning-enabled requests through OpenRouter's reasoning parameter. Developers can access the model's step-by-step thinking process through the reasoning_details array in API responses. When continuing conversations, the complete reasoning_details must be preserved in message history for the model to maintain reasoning continuity.

Pricing positioning

At $0.25 per million input tokens, Gemini 3.1 Flash Lite is priced at 50% of Gemini 3 Flash's cost. The output token pricing of $1.50 per million represents a 6:1 output-to-input ratio, standard for Google's Flash tier models.

What this means

Gemini 3.1 Flash Lite fills a specific market gap for applications requiring multimodal understanding at scale where cost and latency are primary constraints. The four-level thinking system gives developers granular control over the reasoning-cost trade-off, unusual for a "lite" model tier. However, Google has not disclosed benchmark scores or parameter count, making it difficult to assess performance relative to competing models like GPT-4o mini or Claude 3.5 Haiku. The 1M context window matches Gemini 3 Flash, suggesting Google maintained context capability while reducing computational requirements elsewhere in the model architecture.

Related Articles

model release

Google releases Gemini 3.1 Flash Image, claims Pro-level quality at $0.50 per 1M tokens

Google has released Gemini 3.1 Flash Image, internally codenamed "Nano Banana 2," an image generation and editing model with a 131K context window. The model is priced at $0.50 per 1M input tokens and $3 per 1M output tokens.

model release

Mistral OCR 3 launches at $2 per 1,000 pages with 74% win rate over previous version

Mistral AI released Mistral OCR 3, a document extraction model priced at $2 per 1,000 pages ($1 with Batch API discount). The model achieves a 74% overall win rate over its predecessor on forms, scanned documents, complex tables, and handwriting according to internal benchmarks.

model release

Mistral Releases Mistral 3 Family: 675B-Parameter Large 3 MoE and Three Edge Models Under Apache 2.0

Mistral has released Mistral 3, including Mistral Large 3—a sparse mixture-of-experts model with 41B active and 675B total parameters—and three Ministral 3 edge models (3B, 8B, 14B). All models are released under Apache 2.0 license with multimodal capabilities and are available today on multiple platforms.

model release

Google releases Nano Banana Pro image generation model with 2K/4K output and five-subject identity preservation

Google has released Nano Banana Pro, an advanced image generation and editing model built on Gemini 3 Pro. The model supports 2K/4K output resolution, preserves identity across up to five subjects, and includes real-time Search grounding for context-rich visual synthesis.

Comments

Loading...