model release

Chroma releases Context-1, a 20B parameter retrieval agent for complex multi-hop search

TL;DR

Chroma has released Context-1, a 20B parameter Mixture of Experts model trained specifically for retrieval tasks that require multi-hop reasoning. The model decomposes complex queries into subqueries, performs parallel tool calls, and actively prunes its own context mid-search—achieving comparable performance to frontier models at a fraction of the cost and up to 10x faster inference speed.

2 min read
0

Chroma Releases Context-1: A Specialized Retrieval Agent

Chroma has released Context-1, a 20B parameter agentic search model designed to serve as a retrieval subagent alongside frontier reasoning models. Unlike general-purpose LLMs, Context-1 is purpose-built for complex multi-hop retrieval tasks where a query requires iterative decomposition and selective document gathering.

Model Architecture and Training

Context-1 is built on the gpt-oss-20b base model as a Mixture of Experts architecture. The model was trained using supervised fine-tuning (SFT) combined with reinforcement learning through a curriculum-based approach (CISPO). Weights are available in BF16 precision, with an MXFP4 quantized checkpoint coming soon.

The model was trained on diverse domains including web search, legal documents, and financial data, enabling it to generalize across held-out domains and public benchmarks including BrowseComp-Plus, SealQA, FRAMES, and HLE.

Key Technical Capabilities

Query Decomposition: Context-1 breaks down complex, multi-constraint questions into targeted subqueries rather than attempting to answer them directly.

Parallel Tool Calling: The model averages 2.56 tool calls per turn, reducing the total number of search iterations and lowering end-to-end latency compared to sequential approaches.

Self-Editing Context: Perhaps the most distinctive feature is the model's ability to selectively prune irrelevant documents mid-search. Chroma reports a pruning accuracy of 0.94, allowing the model to maintain retrieval quality over long search horizons while operating within a bounded context window.

Cross-Domain Generalization: Training across multiple verticals enables the model to handle domains outside its training distribution.

Critical Limitation: Agent Harness Required

A significant caveat: Context-1 requires a specific agent harness to function as described in the technical report. This harness manages tool execution, token budgets, context pruning, and deduplication. Chroma has not yet publicly released the harness, meaning users cannot currently reproduce the reported performance metrics by running the model directly.

Chroma states the harness will be released "soon" and that its technical report describes the harness design in detail. The open-source community will need to wait for this release to fully evaluate the model's actual performance.

Pricing and Availability

The model is available under Apache 2.0 license on Hugging Face. No pricing information is available as this is an open-weight model intended for self-hosting or integration into applications. No inference provider has deployed it yet.

What This Means

Context-1 represents a shift toward specialized, task-optimized models rather than general-purpose scaling. Rather than making frontier models handle every task, Chroma is arguing for delegating retrieval to a smaller, cheaper, faster specialist agent. The reported 10x speed advantage and fraction-of-cost positioning suggest potential value for organizations building retrieval-augmented generation (RAG) systems.

However, the unavailability of the agent harness is a substantial gap. The technical report's performance claims cannot be independently verified until Chroma releases the harness and evaluation code. Teams considering adoption should plan for either waiting for the full release or building their own harness implementation based on the technical report's specifications.

Related Articles

model release

China's Z.ai releases GLM-5.2, open-source model matching Claude and GPT-5.5 in cybersecurity tasks

Z.ai's GLM-5.2 performs on par with Claude Opus 4.8 and OpenAI's GPT-5.5 in cybersecurity benchmarks while costing roughly half as much to run. Security evaluations from Graphistry and Semgrep confirm the open-weight model's capabilities in vulnerability discovery and cyber investigation, raising concerns about accessibility of advanced hacking tools.

model release

Alibaba Qwen Releases 35B Language World Model for Agent Environment Simulation Across 7 Domains

Alibaba's Qwen team released Qwen-AgentWorld-35B-A3B, a 35 billion parameter language world model designed for agentic environment simulation. The model covers seven domains—MCP tool calling, Search, Terminal, Software Engineering, Android, Web, and OS—in a single model with a 262,144 token context window.

model release

Anthropic's Fable 5 model expected to return next week after 15-day government shutdown

The Trump administration is close to allowing Anthropic to restore access to its Fable 5 model, which has been offline for 15 days due to national security concerns. Insiders expect restrictions could be lifted as soon as next week, though Pentagon and NSA approval is still required.

model release

OpenAI previews GPT-5.6 to select partners with three variants priced from $1 to $30 per million tokens

OpenAI has begun previewing its GPT-5.6 series to a limited group of trusted partners after government review. The release includes three variants: Sol at $5 input/$30 output per million tokens, Terra at $2.50/$15, and Luna at $1/$6.

Comments

Loading...