model release

Google releases Gemma 4 31B with 256K context and configurable reasoning mode

TL;DR

Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.

2 min read
0

Gemma 4 31B Instruct — Quick Specs

Context window262K tokens
Input$0.14/1M tokens
Output$0.4/1M tokens

Google Releases Gemma 4 31B Multimodal Model

Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter dense multimodal model designed for both text and image input processing. The model launched on April 2, 2026, and is available through OpenRouter at $0.14 per million input tokens and $0.40 per million output tokens.

Key Specifications

The Gemma 4 31B Instruct variant includes a 256,144-token context window—among the largest for models in its parameter class. The model supports configurable thinking/reasoning mode, enabling step-by-step reasoning for complex tasks. It includes native function calling capabilities and multilingual support across 140+ languages.

Google is releasing the model under the Apache 2.0 open license, allowing commercial and research use with minimal restrictions.

Capabilities and Performance

According to Google DeepMind, Gemma 4 31B demonstrates particular strength in three areas: coding tasks, reasoning-heavy problems, and document understanding. The configurable reasoning mode allows developers to trade latency for reasoning depth—enabling the model to show its internal thought process before producing final answers.

The multimodal architecture supports both text and image input, though the model outputs text only. This positions it as a document analysis and visual question-answering tool.

What This Means

Gemma 4 31B enters a crowded market of open 30B-class models from Meta (Llama), Mistral, and others, but differentiates on three fronts: the massive 256K context window (useful for long document processing), the explicit reasoning mode (reflecting broader industry trend toward chain-of-thought capabilities), and the Apache 2.0 license (lowest legal friction for commercial deployment).

The pricing—$0.14/$0.40 input/output—is competitive with similar-scale open models. The 256K context window is particularly notable; it enables processing of entire codebases or lengthy documents in a single request, reducing the need for context management and retrieval systems.

For organizations deploying locally or on proprietary infrastructure, the open-source weights and permissive license remove API dependency concerns. The 30.7B parameter count positions it as deployable on consumer-grade hardware (though requiring 60GB+ VRAM for full precision).

Related Articles

model release

Google releases Gemini Omni Flash video generation model with conversational editing, withholds speech synthesis

Google DeepMind released Gemini Omni Flash, the first model in its new Omni family that generates and edits video from image, audio, video, and text inputs. The model is rolling out to Gemini app subscribers and YouTube Shorts with a 10-second clip limit, while speech-editing capabilities remain withheld pending safety testing.

model release

Google releases Gemini 3.5 Flash with 4x faster output and agentic capabilities, 3.5 Pro coming June

Google released Gemini 3.5 Flash today with 4x faster output token generation than competing frontier models while surpassing Gemini 3.1 Pro on coding, agentic, and multimodal benchmarks. The company announced Gemini 3.5 Pro will launch next month and introduced Gemini Omni, a new multimodal series that outputs video.

model release

Perceptron Launches Mk1 Vision-Language Model with Video Reasoning at $0.15/$1.50 per 1M Tokens

Perceptron has released Perceptron Mk1, a vision-language model designed for video understanding and embodied reasoning tasks. The model accepts image and video inputs with 33K context window, priced at $0.15 per 1M input tokens and $1.50 per 1M output tokens, and supports structured spatial annotations on demand.

model release

Google launches Gemini 3.5 Flash and new Omni multimodal AI family at I/O 2026

Google launched Gemini 3.5 Flash today as the default model for its Gemini app and AI Mode in Search, with Gemini 3.5 Pro following next month. The company also introduced Gemini Omni, a new multimodal AI family capable of generating video from text, photos, video, and audio inputs.

Comments

Loading...