model release

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

TL;DR

Google has released Gemini 3.1 Flash Lite, a high-efficiency multimodal model with a 1,048,576 token context window priced at $0.25 per million input tokens and $1.50 per million output tokens. The model supports text, image, video, audio, and PDF inputs with four thinking levels for cost-performance optimization.

May 7, 2026 · 4:21 PM2 min read

Gemini 3.1 Flash Lite — Quick Specs

Context window1049K tokens

Input$0.25/1M tokens

Output$1.5/1M tokens

Compare Gemini 3.1 Flash Lite with other models →

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Google released Gemini 3.1 Flash Lite on May 7, 2026, a high-efficiency multimodal model priced at $0.25 per million input tokens and $1.50 per million output tokens. The model features a 1,048,576 token context window and is priced at half the cost of Gemini 3 Flash, according to Google.

Core specifications

Context window: 1,048,576 tokens
Input pricing: $0.25 per million tokens
Output pricing: $1.50 per million tokens
Modalities: Text, image, video, audio, and PDF inputs
Release date: May 7, 2026

Technical capabilities

Gemini 3.1 Flash Lite supports four distinct thinking levels: minimal, low, medium, and high. These levels allow developers to fine-tune the trade-off between API cost and model performance based on task complexity. Google designed the model specifically for low-latency, high-volume workloads.

The model is optimized for lightweight agentic workflows and simple data extraction tasks. According to Google, the model prioritizes responsiveness and API cost efficiency over maximum capability.

Thinking mode integration

The model supports reasoning-enabled requests through OpenRouter's reasoning parameter. Developers can access the model's step-by-step thinking process through the reasoning_details array in API responses. When continuing conversations, the complete reasoning_details must be preserved in message history for the model to maintain reasoning continuity.

Pricing positioning

At $0.25 per million input tokens, Gemini 3.1 Flash Lite is priced at 50% of Gemini 3 Flash's cost. The output token pricing of $1.50 per million represents a 6:1 output-to-input ratio, standard for Google's Flash tier models.

What this means

Gemini 3.1 Flash Lite fills a specific market gap for applications requiring multimodal understanding at scale where cost and latency are primary constraints. The four-level thinking system gives developers granular control over the reasoning-cost trade-off, unusual for a "lite" model tier. However, Google has not disclosed benchmark scores or parameter count, making it difficult to assess performance relative to competing models like GPT-4o mini or Claude 3.5 Haiku. The 1M context window matches Gemini 3 Flash, suggesting Google maintained context capability while reducing computational requirements elsewhere in the model architecture.

Source: openrouter.ai ↗

google gemini multimodal reasoning model-release context-window pricing

model releaseApril 29, 2026

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

Mistral AI released Mistral Medium 3.5, a 128B parameter dense model with a 256k context window that unifies instruction-following, reasoning, and coding capabilities. The model features configurable reasoning effort per request and a vision encoder trained from scratch for variable image sizes.

model releaseMay 2, 2026

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA released Nemotron-3-Nano-Omni-30B-A3B, a multimodal large language model with 31 billion parameters that processes video, audio, images, and text with up to 256K token context. The model uses a Mamba2-Transformer hybrid Mixture of Experts architecture and supports chain-of-thought reasoning mode.

model releaseApril 29, 2026

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

NVIDIA released Nemotron 3 Nano Omni, a 31B parameter (30B active, 3B per token) multimodal model supporting video, audio, image, and text inputs. The model features a 256K token context window, reasoning mode with chain-of-thought, and tool calling capabilities.

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Gemini 3.1 Flash Lite — Quick Specs

Google releases Gemini 3.1 Flash Lite with 1M context at $0.25 per million input tokens

Core specifications

Technical capabilities

Thinking mode integration

Pricing positioning

What this means

Related Articles

Mistral Releases Medium 3.5: 128B Dense Model With 256k Context and Configurable Reasoning

NVIDIA releases Nemotron-3-Nano-Omni-30B, a 31B-parameter multimodal model with 256K context and reasoning mode

NVIDIA Releases Nemotron 3 Nano Omni: 31B Multimodal Model With 256K Context and Reasoning Mode

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Comments