model releaseMicrosoft

Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%

TL;DR

Microsoft released FastContext-1.0, a lightweight repository-exploration subagent for LLM coding agents spanning 4B to 30B parameters. The model reduced main-agent token consumption by up to 60% while improving end-to-end resolution rates by up to 5.5% on SWE-bench Pro when integrated with agents like GPT-5.4 and GLM-5.1.

2 min read
0

Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%

Microsoft released FastContext-1.0, a dedicated repository-exploration subagent designed to offload code search tasks from primary coding agents. The model family includes variants at 4B and 30B parameters, with the 4B reinforcement-learning version (FC-4B-RL) matching or exceeding the larger 30B model on several benchmarks.

Architecture and Design

FastContext addresses a core inefficiency in modern coding agents: according to Microsoft's analysis of GPT-5.4 trajectories, repository exploration consumes 56.2% of all tool-use turns and 46.5% of total tokens. The subagent architecture separates exploration from problem-solving — the main agent queries FastContext, which executes parallel read-only operations (READ, GLOB, GREP) and returns focused file paths and line ranges.

The model supports context windows up to 262,144 tokens and is built on Qwen3-4B-Instruct (4B variants) and Qwen3-Coder-30B-A3B (30B variant) backbones. Available variants include FC-4B-SFT, FC-4B-RL (deployment targets), and FC-30B-SFT (scaling reference).

Performance Metrics

Integrating FastContext into Mini-SWE-Agent delivered measurable improvements across three benchmarks:

SWE-bench Pro results (most challenging):

  • GPT-5.4 + FC-4B-RL: 78.3% resolution (+5.5 points), 338k tokens (-26.0%)
  • GLM-5.1 + FC-4B-RL: 22.5% resolution (+5.0 points), 2.21M tokens (-17.9%)
  • Kimi-K2.6 + FC-4B-RL: 33.5% resolution (+2.5 points), 2.16M tokens (-9.4%)

Token reduction extremes:

  • GPT-5.4 on SWE-QA: 49.8% fewer tokens (210k vs. 418k)
  • GPT-5.4 on SWE-bench Multilingual: 50.7% fewer tokens (206k vs. 418k)

The compact 4B-RL model consistently outperformed the 30B-SFT variant despite having 7.5× fewer parameters — on GLM-5.1 SWE-bench Pro, FC-4B-RL achieved 22.5% versus 20.0% for FC-30B-SFT.

Training Methodology

Microsoft trained FastContext in two stages. The supervised fine-tuning (SFT) phase used three trace types: parallel_toolcalls for broad first-turn search, multiturn_traj for multi-turn evidence gathering, and linerange for citation generation. The reinforcement learning (RL) stage employed GRPO optimization with a reward function combining file-level F1, line-level F1, bounded parallel exploration bonuses, and format penalties.

Technical Details

The model operates through an internal exploration loop: query understanding translates issues into search intents, parallel tool calling issues multiple READ/GLOB/GREP operations simultaneously, observation-driven refinement guides subsequent searches, and final citations return compact file-path and line-range lists.

FastContext can be deployed via SGLang or similar OpenAI-compatible servers. The model exposes only three read-only tools and operates as an on-demand subagent invoked by the main coding agent.

What This Means

FastContext demonstrates that specialized subagents can outperform monolithic coding agents on specific tasks while reducing computational overhead. The 4B model's ability to match 30B performance at one-seventh the parameter count suggests effective reinforcement learning can compensate for model size in narrow domains. For production deployments, the 60% token reduction directly translates to lower API costs and faster response times. The architecture's separation of concerns — exploration versus problem-solving — may become a standard pattern for complex agent workflows.

The model and training code are available under MIT license at https://github.com/microsoft/fastcontext.

Related Articles

product update

Microsoft restricts Claude Fable 5 internally over 30-day data retention requirement

Microsoft has restricted internal employee access to Anthropic's newly released Claude Fable 5 model while its legal teams evaluate the company's new data retention requirements. The model requires storing prompts and outputs for 30 days to operate safety classifiers, with some content potentially retained for up to two years if flagged for policy violations.

model release

Cohere Releases North Mini Code 1.0: 30B-Parameter MoE Model With 256K Context for Agentic Coding

Cohere Labs has released North Mini Code 1.0, a 30B-parameter sparse Mixture-of-Experts model with 3B active parameters and a 256K context window. The Apache 2.0-licensed model is optimized for agentic software engineering, featuring 128 experts with 8 activated per token, and trained specifically for tool use in coding tasks.

model release

Google DeepMind releases Gemma 4 12B: encoder-free multimodal model runs on 16GB RAM

Google DeepMind has released Gemma 4 12B, a 12-billion parameter multimodal model that runs locally on laptops with 16GB of RAM. The model eliminates separate vision and audio encoders, processing raw inputs directly through its language model backbone under an Apache 2.0 license.

model release

Amazon Bedrock adds Gemma 4 models with 256K context and built-in reasoning mode

Amazon Web Services today announced availability of Google DeepMind's Gemma 4 family on Amazon Bedrock. The open-weight models include three instruction-tuned variants spanning 2.3B to 30.7B parameters, with 256K context windows, multimodal input support, and built-in reasoning mode.

Comments

Loading...