Amazon Bedrock adds Gemma 4 models with 256K context and built-in reasoning mode

TL;DR

Amazon Web Services today announced availability of Google DeepMind's Gemma 4 family on Amazon Bedrock. The open-weight models include three instruction-tuned variants spanning 2.3B to 30.7B parameters, with 256K context windows, multimodal input support, and built-in reasoning mode.

June 15, 2026 · 8:35 PM2 min read

Gemma 4 31B — Quick Specs

Context window256K tokens

Compare Gemma 4 31B with other models →

Three model variants

The Gemma 4 family includes:

Gemma 4 31B: Dense architecture with 30.7B parameters, 256K context window
Gemma 4 26B-A4B: Mixture-of-experts design with 25.2B total parameters but only 3.8B active per token, 256K context window
Gemma 4 E2B: Compact model with 5.1B total parameters (2.3B effective), 128K context window

All three variants support text and image input, native function calling, and over 35 languages. According to AWS, independent benchmarks from Artificial Analysis report an Intelligence Index of 39 for Gemma 4 31B, compared to a median of 15 in the 4B-40B open-weights class.

Technical architecture

The models use hybrid attention that interleaves local and global attention to maintain long context support while reducing memory footprint. The 26B-A4B variant activates only 3.8B parameters per token despite having 25.2B total, delivering what AWS describes as "4B-class cost and latency with the knowledge capacity of a larger model."

The E2B variant uses Per-Layer Embeddings (PLE) to keep its effective parameter count at 2.3B of 5.1B total parameters.

Built-in reasoning mode

All Gemma 4 variants include a built-in reasoning mode that, when enabled, emits the model's internal thought process before producing the final answer. AWS documentation notes that in multi-turn conversations, only final answers from previous turns should be sent back to the model, not their reasoning items, as "replaying prior reasoning back to the model can degrade its responses."

Service access

The models are accessed through Amazon Bedrock's bedrock-mantle endpoint, which uses an OpenAI-compatible API. The endpoint URL is https://bedrock-mantle.{region}.api.aws/openai/v1 and supports both Chat Completions and Responses APIs.

All three variants are available in Standard, Priority, and Flex service tiers. AWS states that prompts and completions are not used to train any models and content is not shared with third parties.

The models are released under the Apache 2.0 license, allowing independent evaluation of model architecture and training methodology.

What this means

Gemma 4's availability on Bedrock gives enterprises access to competitive open-weight models through AWS infrastructure without managing inference stacks. The MoE variant's 3.8B active parameters at 25.2B total capacity represents a meaningful efficiency gain for high-throughput workloads. The 256K context window matches or exceeds most competing models, though pricing details were not disclosed in the announcement, making direct cost comparisons premature.

Source: aws.amazon.com ↗

Gemma Google DeepMind Amazon Bedrock AWS Open-weight models Mixture-of-experts Multimodal Reasoning

model releaseJuly 30, 2026

Moonshot AI Releases Kimi K3, a 2.8 Trillion Parameter Open-Weight Model; AWS Publishes Deployment Guide

Moonshot AI released Kimi K3 on July 27, 2026, a 2.8 trillion parameter Mixture-of-Experts model with a 1 million token context window and native multimodal support. AWS has published a deployment guide covering SageMaker HyperPod and Amazon EKS using ml.p6-b300.48xlarge instances with 8 NVIDIA B300 Blackwell Ultra GPUs.

model releaseJuly 29, 2026

Unsloth Releases GGUF Quantizations of Kimi K3, a 2.8T-Parameter Open-Weight MoE Model

Unsloth has released GGUF quantizations of Kimi K3, a 2.8-trillion-parameter open-weight Mixture-of-Experts model from Moonshot AI with a 1-million-token context window and native vision support. The largest lossless quantization (Q8) weighs in at 1.56TB.

model releaseJuly 28, 2026

Moonshot AI Releases Kimi K3: 2.8T-Parameter Open-Weight Model with 1M-Token Context, Now Available via Unsloth Quantiza

Moonshot AI has released Kimi K3, a 2.8-trillion-parameter open-weight mixture-of-experts model with a 1-million-token context window and native multimodal support. Unsloth has published Dynamic 2.0 quantized versions on Hugging Face, claiming improved accuracy over other quantization methods.