Step-3.7-Flash

Name: Step-3.7-Flash
Author: StepFun

StepFun🇨🇳 China

active

Compare with other models →

Context window256K tokens

Version History

3.7-flashmajorJune 1, 2026

Initial GGUF release of Step-3.7-Flash with seven quantization variants from BF16 (394GB) to IQ3_XXS (76GB), all optimized for local deployment on consumer hardware with 128GB memory.

3.7majorMay 29, 2026

Initial release of Step-3.7-Flash, a 198B-parameter sparse MoE vision-language model with 256K context window, three reasoning levels, and production-focused architecture delivering 400 tokens/sec.

Coverage

model releaseStepFun

StepFun Releases Step-3.7-Flash: 198B-Parameter Sparse MoE Model With 256K Context in GGUF Format

StepFun has released Step-3.7-Flash, a 198B-parameter sparse Mixture-of-Experts vision-language model that activates approximately 11B parameters per token. The model supports a 256K context window, native image understanding via a 1.8B-parameter vision encoder, and offers three selectable reasoning levels.

June 1, 2026 · 8:06 AM2 min read

StepFun Step-3.7-Flash MoE

model release

StepFun releases Step-3.7-Flash: 198B-parameter MoE model with 256K context at $0.20/M input tokens

StepFun has released Step-3.7-Flash, a 198B-parameter sparse Mixture-of-Experts vision-language model that activates 11B parameters per token and delivers up to 400 tokens per second. The model supports a 256K context window, three selectable reasoning levels, and is priced at $0.20 per million input tokens (cache miss) and $1.15 per million output tokens.

May 29, 2026 · 12:51 PM2 min read

stepfun mixture-of-experts vision-language-model

model releaseStepFun

StepFun launches Step 3.7 Flash: 196B MoE model with 256K context and adjustable reasoning levels at $0.20/$1.15 per 1M

StepFun has released Step 3.7 Flash, a 196B-parameter Mixture-of-Experts model that activates approximately 11B parameters per token. The multimodal model supports a 256K context window and introduces selectable reasoning levels (high/medium/low), priced at $0.20 per 1M input tokens and $1.15 per 1M output tokens.

May 29, 2026 · 12:20 AM2 min read

StepFun Step 3.7 Flash Mixture-of-Experts