model release

Step-3.5-Flash-Base: StepFun releases lightweight text generation model

TL;DR

StepFun has released Step-3.5-Flash-Base, a text generation model available on Hugging Face under Apache 2.0 license. The model is part of the Step 3.5 series and focuses on efficient inference.

March 5, 2026 · 8:50 AM1 min read

Step-3.5-Flash-Base — Quick Specs

Context window262K tokens

Input$0.1/1M tokens

Output$0.3/1M tokens

Compare Step-3.5-Flash-Base with other models →

StepFun Releases Step-3.5-Flash-Base

StepFun has released Step-3.5-Flash-Base, a text generation model designed for efficient inference. The model is available on Hugging Face as an open-source release under the Apache 2.0 license.

Model Details

The Step-3.5-Flash-Base model is positioned as a lightweight variant in StepFun's Step 3.5 series. The "Flash" designation indicates optimization for speed and reduced computational requirements compared to full-scale variants.

The model supports standard transformer architecture with SafeTensors format for optimized loading and inference. It is available for deployment across multiple regions, including US-based infrastructure.

Technical Specifications

The model is available on Hugging Face with the following characteristics:

Format: SafeTensors (optimized tensor serialization)
License: Apache 2.0 (permissive open-source)
Architecture: Transformer-based text generation
Pipeline: Text generation

As of release, the model has accumulated 58 likes and 135 downloads on Hugging Face, indicating early adoption from the community.

Research Background

The release includes references to two research papers (arxiv:2602.10604 and arxiv:2601.05593), suggesting the model incorporates recent algorithmic improvements from StepFun's research efforts.

Access and Availability

Step-3.5-Flash-Base is available for immediate download from Hugging Face. The Apache 2.0 license permits commercial use, modification, and distribution with appropriate attribution.

The model includes custom code implementations, indicating optimized inference kernels or specialized processing logic beyond standard transformer implementations.

What This Means

StepFun's release of Step-3.5-Flash-Base represents continued activity in the efficiency-focused segment of LLM development. The "Flash" branding suggests a deliberate positioning toward cost-effective inference—a key consideration for production deployments where computational overhead directly impacts operational costs. The open-source Apache 2.0 release indicates StepFun's strategy to build adoption through community distribution rather than API-gated access. Early download metrics suggest interest from practitioners seeking efficient alternatives to larger models.

Source: huggingface.co ↗

stepfun model-release text-generation open-source transformers efficient-inference hugging-face

model releaseJune 5, 2026

NVIDIA Releases Nemotron-3-Ultra: 550B Parameter Model with 1M Token Context and Configurable Reasoning

NVIDIA released Nemotron-3-Ultra-550B-A55B-NVFP4, a 550B parameter model with 55B active parameters, featuring a 1M token context window and configurable reasoning mode. The model uses a hybrid LatentMoE architecture combining Mamba-2, Mixture-of-Experts, and Attention layers with Multi-Token Prediction, trained with NVIDIA's NVFP4 quantization-aware approach.

model releaseJune 4, 2026

Ideogram 4: 9.3B parameter open-weight text-to-image model with native 2K resolution and structured JSON prompting

Ideogram has released Ideogram 4, its first open-weight text-to-image model with 9.3 billion parameters. The model supports native 2K resolution, structured JSON prompting with bounding-box layout controls, and is available in nf4 and fp8 quantizations under a non-commercial license.

model releaseJune 3, 2026

Google DeepMind releases Gemma 4 12B Unified: encoder-free multimodal model with 256K context window

Google DeepMind has released Gemma 4 12B Unified, an encoder-free multimodal model that processes text, images, and audio through a single decoder-only transformer. The model features 11.95 billion parameters, a 256K token context window, and achieves 77.2% on MMLU Pro and 72.0% on LiveCodeBench v6.