Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens
Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.
Gemini 3.1 Flash Lite — Quick Specs
Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens
Google released Gemini 3.1 Flash Lite on May 7, 2026, a high-efficiency multimodal model priced at $0.25 per million input tokens and $1.50 per million output tokens. The model features a 1,048,576 token context window and is priced at half the cost of Gemini 3 Flash, according to Google.
Core specifications
- Context window: 1,048,576 tokens
- Input pricing: $0.25 per million tokens
- Output pricing: $1.50 per million tokens
- Modalities: Text, image, video, audio, and PDF inputs
- Release date: May 7, 2026
Technical capabilities
Gemini 3.1 Flash Lite supports four distinct thinking levels: minimal, low, medium, and high. These levels allow developers to fine-tune the trade-off between API cost and model performance based on task complexity. Google designed the model specifically for low-latency, high-volume workloads.
The model is optimized for lightweight agentic workflows and simple data extraction tasks. According to Google, the model prioritizes responsiveness and API cost efficiency over maximum capability.
Thinking mode integration
The model supports reasoning-enabled requests through OpenRouter's reasoning parameter. Developers can access the model's step-by-step thinking process through the reasoning_details array in API responses. When continuing conversations, the complete reasoning_details must be preserved in message history for the model to maintain reasoning continuity.
Pricing positioning
At $0.25 per million input tokens, Gemini 3.1 Flash Lite is priced at 50% of Gemini 3 Flash's cost. The output token pricing of $1.50 per million represents a 6:1 output-to-input ratio, standard for Google's Flash tier models.
What this means
Gemini 3.1 Flash Lite fills a specific market gap for applications requiring multimodal understanding at scale where cost and latency are primary constraints. The four-level thinking system gives developers granular control over the reasoning-cost trade-off, unusual for a "lite" model tier. However, Google has not disclosed benchmark scores or parameter count, making it difficult to assess performance relative to competing models like GPT-4o mini or Claude 3.5 Haiku. The 1M context window matches Gemini 3 Flash, suggesting Google maintained context capability while reducing computational requirements elsewhere in the model architecture.
Related Articles
Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning
Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.
NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode
NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.
NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode
NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.
Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks
Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.
Comments
Loading...