model release

Step-3.5-Flash-Base: StepFun releases lightweight text generation model

TL;DR

StepFun has released Step-3.5-Flash-Base, a text generation model available on Hugging Face under Apache 2.0 license. The model is part of the Step 3.5 series and focuses on efficient inference.

1 min read
0

Step-3.5-Flash-Base — Quick Specs

Context window262K tokens
Input$0.1/1M tokens
Output$0.3/1M tokens

StepFun Releases Step-3.5-Flash-Base

StepFun has released Step-3.5-Flash-Base, a text generation model designed for efficient inference. The model is available on Hugging Face as an open-source release under the Apache 2.0 license.

Model Details

The Step-3.5-Flash-Base model is positioned as a lightweight variant in StepFun's Step 3.5 series. The "Flash" designation indicates optimization for speed and reduced computational requirements compared to full-scale variants.

The model supports standard transformer architecture with SafeTensors format for optimized loading and inference. It is available for deployment across multiple regions, including US-based infrastructure.

Technical Specifications

The model is available on Hugging Face with the following characteristics:

  • Format: SafeTensors (optimized tensor serialization)
  • License: Apache 2.0 (permissive open-source)
  • Architecture: Transformer-based text generation
  • Pipeline: Text generation

As of release, the model has accumulated 58 likes and 135 downloads on Hugging Face, indicating early adoption from the community.

Research Background

The release includes references to two research papers (arxiv:2602.10604 and arxiv:2601.05593), suggesting the model incorporates recent algorithmic improvements from StepFun's research efforts.

Access and Availability

Step-3.5-Flash-Base is available for immediate download from Hugging Face. The Apache 2.0 license permits commercial use, modification, and distribution with appropriate attribution.

The model includes custom code implementations, indicating optimized inference kernels or specialized processing logic beyond standard transformer implementations.

What This Means

StepFun's release of Step-3.5-Flash-Base represents continued activity in the efficiency-focused segment of LLM development. The "Flash" branding suggests a deliberate positioning toward cost-effective inference—a key consideration for production deployments where computational overhead directly impacts operational costs. The open-source Apache 2.0 release indicates StepFun's strategy to build adoption through community distribution rather than API-gated access. Early download metrics suggest interest from practitioners seeking efficient alternatives to larger models.

Related Articles

model release

NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning

NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.

model release

Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting

Ideogram has released Ideogram 4, its first open-weight text-to-image model with 9.3 billion parameters. The model supports native 2K resolution, structured JSON prompting with bounding-box layout controls, and is available in nf4 and fp8 quantizations under a non-commercial license.

model release

Google DeepMind releases Gemma 4 12B Unified: encoder-free multimodal model with 256K context window

Google DeepMind has released Gemma 4 12B Unified, an encoder-free multimodal model that processes text, images, and audio through a single decoder-only transformer. The model features 11.95 billion parameters, a 256K token context window, and achieves 77.2% on MMLU Pro and 72.0% on LiveCodeBench v6.

model release

Alibaba's Qwen Releases Qwen3.7 Plus: 1M Context Window at $0.40 Per Million Input Tokens

Alibaba's Qwen has released Qwen3.7 Plus, a multimodal model with a 1 million token context window. The model accepts text and image input with text output, priced at $0.40 per million input tokens and $1.60 per million output tokens through OpenRouter's API.

Comments

Loading...