Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%

TL;DR

Microsoft released FastContext-1.0, a lightweight repository-exploration subagent for LLM coding agents spanning 4B to 30B parameters. The model reduced main-agent token consumption by up to 60% while improving end-to-end resolution rates by up to 5.5% on SWE-bench Pro when integrated with agents like GPT-5.4 and GLM-5.1.

June 15, 2026 · 6:51 PM2 min read

FastContext-1.0-4B-SFT — Quick Specs

Context window262K tokens

Compare FastContext-1.0-4B-SFT with other models →

Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%

Microsoft released FastContext-1.0, a dedicated repository-exploration subagent designed to offload code search tasks from primary coding agents. The model family includes variants at 4B and 30B parameters, with the 4B reinforcement-learning version (FC-4B-RL) matching or exceeding the larger 30B model on several benchmarks.

Architecture and Design

FastContext addresses a core inefficiency in modern coding agents: according to Microsoft's analysis of GPT-5.4 trajectories, repository exploration consumes 56.2% of all tool-use turns and 46.5% of total tokens. The subagent architecture separates exploration from problem-solving — the main agent queries FastContext, which executes parallel read-only operations (READ, GLOB, GREP) and returns focused file paths and line ranges.

The model supports context windows up to 262,144 tokens and is built on Qwen3-4B-Instruct (4B variants) and Qwen3-Coder-30B-A3B (30B variant) backbones. Available variants include FC-4B-SFT, FC-4B-RL (deployment targets), and FC-30B-SFT (scaling reference).

Performance Metrics

Integrating FastContext into Mini-SWE-Agent delivered measurable improvements across three benchmarks:

SWE-bench Pro results (most challenging):

GPT-5.4 + FC-4B-RL: 78.3% resolution (+5.5 points), 338k tokens (-26.0%)
GLM-5.1 + FC-4B-RL: 22.5% resolution (+5.0 points), 2.21M tokens (-17.9%)
Kimi-K2.6 + FC-4B-RL: 33.5% resolution (+2.5 points), 2.16M tokens (-9.4%)

Token reduction extremes:

GPT-5.4 on SWE-QA: 49.8% fewer tokens (210k vs. 418k)
GPT-5.4 on SWE-bench Multilingual: 50.7% fewer tokens (206k vs. 418k)

The compact 4B-RL model consistently outperformed the 30B-SFT variant despite having 7.5× fewer parameters — on GLM-5.1 SWE-bench Pro, FC-4B-RL achieved 22.5% versus 20.0% for FC-30B-SFT.

Training Methodology

Microsoft trained FastContext in two stages. The supervised fine-tuning (SFT) phase used three trace types: parallel_toolcalls for broad first-turn search, multiturn_traj for multi-turn evidence gathering, and linerange for citation generation. The reinforcement learning (RL) stage employed GRPO optimization with a reward function combining file-level F1, line-level F1, bounded parallel exploration bonuses, and format penalties.

Technical Details

The model operates through an internal exploration loop: query understanding translates issues into search intents, parallel tool calling issues multiple READ/GLOB/GREP operations simultaneously, observation-driven refinement guides subsequent searches, and final citations return compact file-path and line-range lists.

FastContext can be deployed via SGLang or similar OpenAI-compatible servers. The model exposes only three read-only tools and operates as an on-demand subagent invoked by the main coding agent.

What This Means

FastContext demonstrates that specialized subagents can outperform monolithic coding agents on specific tasks while reducing computational overhead. The 4B model's ability to match 30B performance at one-seventh the parameter count suggests effective reinforcement learning can compensate for model size in narrow domains. For production deployments, the 60% token reduction directly translates to lower API costs and faster response times. The architecture's separation of concerns — exploration versus problem-solving — may become a standard pattern for complex agent workflows.

The model and training code are available under MIT license at https://github.com/microsoft/fastcontext.

Source: huggingface.co ↗

microsoft coding-agents repository-search reinforcement-learning swe-bench qwen open-source

product updateJuly 30, 2026

Microsoft Confirms Copilot 'Super App' Merging Chat, Code, and Agents Launching This Year

Microsoft CEO Satya Nadella confirmed during an earnings call that a Copilot 'super app' merging chat, code, Cowork, and Autopilots will launch this year for both consumer and commercial users. The announcement follows OpenAI's own super app rollout, which the company has admitted is 'kind of a mess.'

model releaseJuly 30, 2026

Microsoft AI Shifts Strategy to Cheap Specialist Models Over Frontier Chasing

Microsoft AI CEO Mustafa Suleyman says the company is prioritizing token efficiency and compact, single-purpose models over general-purpose frontier systems. New models MAI-Cyber-1-Flash and MAI-Image-2.5-Flash claim strong cost-performance gains, but rely on an orchestration layer that still routes hard tasks to OpenAI's reasoning models.

product updateJuly 29, 2026

Microsoft Confirms Copilot 'Super App' Merging Chat, Code, and Agents Ships This Year

Microsoft CEO Satya Nadella confirmed during a Wednesday earnings call that the company is merging Copilot chat, GitHub Copilot coding features, Cowork, and Autopilot agents into a single 'super app' launching this year. The move mirrors OpenAI's recent ChatGPT Work app, which combines ChatGPT and Codex.

model releaseJuly 29, 2026

Microsoft Releases Mage-VL, a 4B-Parameter Codec-Native Streaming Vision-Language Model

Microsoft has released Mage-VL, a codec-native multimodal foundation model built on a from-scratch 4B-parameter visual encoder paired with Qwen3-4B-Instruct-2507. The model claims up to 3.5x inference speedup over uniform frame sampling and outperforms Qwen3-VL-4B on video and temporal-grounding benchmarks, according to Microsoft.

Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%

FastContext-1.0-4B-SFT — Quick Specs

Microsoft Releases FastContext-1.0: 4B-Parameter Repository Explorer Cuts Coding Agent Token Use by 60%

Architecture and Design

Performance Metrics

Training Methodology

Technical Details

What This Means

Related Articles

Microsoft Confirms Copilot 'Super App' Merging Chat, Code, and Agents Launching This Year

Microsoft AI Shifts Strategy to Cheap Specialist Models Over Frontier Chasing

Microsoft Confirms Copilot 'Super App' Merging Chat, Code, and Agents Ships This Year

Microsoft Releases Mage-VL, a 4B-Parameter Codec-Native Streaming Vision-Language Model

Comments