model releaseStepFun

StepFun launches Step 3.7 Flash: 196B MoE model with 256K context and adjustable reasoning levels at $0.20/$1.15 per 1M

TL;DR

StepFun has released Step 3.7 Flash, a 196B-parameter Mixture-of-Experts model that activates approximately 11B parameters per token. The multimodal model supports a 256K context window and introduces selectable reasoning levels (high/medium/low), priced at $0.20 per 1M input tokens and $1.15 per 1M output tokens.

May 29, 2026 · 12:20 AM2 min read

Step-3.7-Flash — Quick Specs

Context window256K tokens

Compare Step-3.7-Flash with other models →

StepFun Launches Step 3.7 Flash with Adjustable Reasoning Levels

StepFun has released Step 3.7 Flash, a 196B-parameter Mixture-of-Experts (MoE) model that activates roughly 11B parameters per token during inference. The model includes native image and video understanding capabilities through an integrated vision encoder.

Technical Specifications

Step 3.7 Flash supports a 256K token context window and is priced at $0.20 per 1M input tokens and $1.15 per 1M output tokens. The model was released on May 28, 2025, according to OpenRouter's listing.

The architecture combines a 196B-parameter language backbone with a vision encoder, making it StepFun's latest multimodal offering. By activating only 11B parameters per token through its MoE design, the model aims to balance performance with computational efficiency.

Selectable Reasoning Levels

A distinctive feature is the model's three selectable reasoning levels—high, medium, and low—allowing developers to trade off between processing speed, cost, and reasoning depth based on specific use cases. This gives callers direct control over how the model allocates compute resources per query.

Target Use Cases

According to StepFun, Step 3.7 Flash is designed for coding tasks, agentic workflows, structured output generation, and long-context productivity applications. The 256K context window positions it for document analysis, extended code review, and multi-turn conversations requiring substantial memory.

The model is currently available through OpenRouter, which routes requests across multiple providers to handle different prompt sizes and parameters.

What This Means

Step 3.7 Flash represents StepFun's entry into the competitive space of large-context multimodal models, directly competing with offerings from Anthropic, Google, and others in the 200K+ context range. The adjustable reasoning levels are a notable differentiation—most models offer fixed inference patterns, while this approach lets developers optimize for their specific latency and quality requirements. The $0.20/$1.15 pricing puts it in the mid-tier range, though real-world performance benchmarks will determine whether the selectable reasoning modes deliver meaningful value beyond standard inference optimization.

Source: openrouter.ai ↗

StepFun Step 3.7 Flash Mixture-of-Experts MoE multimodal 256K context adjustable reasoning model release

model releaseJuly 14, 2026

Google releases Gemma 4 E2B, optimized to run natively on Pixel 10's Tensor G5 TPU

Google has released Gemma 4 E2B for TPU, a variant of its open-source Gemma 4 model optimized to run natively on the Tensor G5 chip in Pixel 10 devices. The multimodal model enables completely offline AI chat, image recognition, and audio transcription on Pixel 10, 10 Pro, 10 Pro XL, and 10 Pro Fold.

model releaseJuly 13, 2026

OpenAI GPT-5.6 Sol, Terra, and Luna launch on Amazon Bedrock with 80-point Coding Agent Index score

OpenAI's GPT-5.6 model family is now generally available on Amazon Bedrock, introducing a three-tier system: Sol (flagship reasoning), Terra (balanced production), and Luna (fast inference). According to OpenAI, Sol scores 80 points on the Artificial Analysis Coding Agent Index and 73.5% on ExploitBench, establishing new benchmarks while using less than half the output tokens of competing models.

model releaseJuly 9, 2026

OpenAI releases GPT-5.6 with three model variants, claims 80-point Coding Agent Index score for Sol

OpenAI released GPT-5.6 in three variants: Sol ($5 input/$30 output per 1M tokens), Terra ($2.50/$15), and Luna ($1/$6). According to OpenAI, Sol achieves an 80-point score on the Artificial Analysis Coding Agent Index, 2.8 points above Anthropic's Fable 5, while using less than half the output tokens and costing one-third less.