Google releases Gemma 4 31B with 256K context and configurable reasoning mode
Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter multimodal model supporting text and image input. The model features a 262,144-token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages under Apache 2.0 license.
Gemma 4 31B Instruct — Quick Specs
Google Releases Gemma 4 31B Multimodal Model
Google DeepMind has released Gemma 4 31B, a 30.7-billion-parameter dense multimodal model designed for both text and image input processing. The model launched on April 2, 2026, and is available through OpenRouter at $0.14 per million input tokens and $0.40 per million output tokens.
Key Specifications
The Gemma 4 31B Instruct variant includes a 256,144-token context window—among the largest for models in its parameter class. The model supports configurable thinking/reasoning mode, enabling step-by-step reasoning for complex tasks. It includes native function calling capabilities and multilingual support across 140+ languages.
Google is releasing the model under the Apache 2.0 open license, allowing commercial and research use with minimal restrictions.
Capabilities and Performance
According to Google DeepMind, Gemma 4 31B demonstrates particular strength in three areas: coding tasks, reasoning-heavy problems, and document understanding. The configurable reasoning mode allows developers to trade latency for reasoning depth—enabling the model to show its internal thought process before producing final answers.
The multimodal architecture supports both text and image input, though the model outputs text only. This positions it as a document analysis and visual question-answering tool.
What This Means
Gemma 4 31B enters a crowded market of open 30B-class models from Meta (Llama), Mistral, and others, but differentiates on three fronts: the massive 256K context window (useful for long document processing), the explicit reasoning mode (reflecting broader industry trend toward chain-of-thought capabilities), and the Apache 2.0 license (lowest legal friction for commercial deployment).
The pricing—$0.14/$0.40 input/output—is competitive with similar-scale open models. The 256K context window is particularly notable; it enables processing of entire codebases or lengthy documents in a single request, reducing the need for context management and retrieval systems.
For organizations deploying locally or on proprietary infrastructure, the open-source weights and permissive license remove API dependency concerns. The 30.7B parameter count positions it as deployable on consumer-grade hardware (though requiring 60GB+ VRAM for full precision).
Related Articles
Mistral releases Leanstral 1.5: 119B parameter open-source model for Lean 4 proof assistance
Mistral AI has released Leanstral 1.5, an open-source 119B parameter mixture-of-experts model designed specifically for Lean 4 proof assistance. The model features 128 experts with 4 active per token (6.5B activated parameters), a 256k token context window, and multimodal input capabilities.
Portugal releases Amália, open-source 9B parameter AI model trained on European Portuguese
Portugal has released Amália, its first national AI model trained specifically for European Portuguese. Built on EuroLLM-9B with 9 billion parameters, the model is fully open-source with weights, datasets, and code published under an open license. The government has committed €5.5m in initial funding through 2027.
DeepReinforce Releases Ornith-1.0, Open-Source Agentic Coding Model in 9B to 397B Sizes
DeepReinforce has released Ornith-1.0, an MIT-licensed model designed for agentic coding tasks with variants ranging from 9B to 397B parameters. Built on top of Apache 2.0-licensed Gemma 4 and Qwen 3.5 base models, the company claims it achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks.
DeepSeek Releases V4 Models: 1M Context Window, 90% Less KV Cache Than V3
DeepSeek has released two new MoE models: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated). Both models support a one million token context window and use a hybrid attention architecture that requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2.
Comments
Loading...