Chroma releases Context-1, a 20B parameter retrieval agent for complex multi-hop search
Chroma has released Context-1, a 20B parameter Mixture of Experts model trained specifically for retrieval tasks that require multi-hop reasoning. The model decomposes complex queries into subqueries, performs parallel tool calls, and actively prunes its own context mid-search—achieving comparable performance to frontier models at a fraction of the cost and up to 10x faster inference speed.
Chroma Releases Context-1: A Specialized Retrieval Agent
Chroma has released Context-1, a 20B parameter agentic search model designed to serve as a retrieval subagent alongside frontier reasoning models. Unlike general-purpose LLMs, Context-1 is purpose-built for complex multi-hop retrieval tasks where a query requires iterative decomposition and selective document gathering.
Model Architecture and Training
Context-1 is built on the gpt-oss-20b base model as a Mixture of Experts architecture. The model was trained using supervised fine-tuning (SFT) combined with reinforcement learning through a curriculum-based approach (CISPO). Weights are available in BF16 precision, with an MXFP4 quantized checkpoint coming soon.
The model was trained on diverse domains including web search, legal documents, and financial data, enabling it to generalize across held-out domains and public benchmarks including BrowseComp-Plus, SealQA, FRAMES, and HLE.
Key Technical Capabilities
Query Decomposition: Context-1 breaks down complex, multi-constraint questions into targeted subqueries rather than attempting to answer them directly.
Parallel Tool Calling: The model averages 2.56 tool calls per turn, reducing the total number of search iterations and lowering end-to-end latency compared to sequential approaches.
Self-Editing Context: Perhaps the most distinctive feature is the model's ability to selectively prune irrelevant documents mid-search. Chroma reports a pruning accuracy of 0.94, allowing the model to maintain retrieval quality over long search horizons while operating within a bounded context window.
Cross-Domain Generalization: Training across multiple verticals enables the model to handle domains outside its training distribution.
Critical Limitation: Agent Harness Required
A significant caveat: Context-1 requires a specific agent harness to function as described in the technical report. This harness manages tool execution, token budgets, context pruning, and deduplication. Chroma has not yet publicly released the harness, meaning users cannot currently reproduce the reported performance metrics by running the model directly.
Chroma states the harness will be released "soon" and that its technical report describes the harness design in detail. The open-source community will need to wait for this release to fully evaluate the model's actual performance.
Pricing and Availability
The model is available under Apache 2.0 license on Hugging Face. No pricing information is available as this is an open-weight model intended for self-hosting or integration into applications. No inference provider has deployed it yet.
What This Means
Context-1 represents a shift toward specialized, task-optimized models rather than general-purpose scaling. Rather than making frontier models handle every task, Chroma is arguing for delegating retrieval to a smaller, cheaper, faster specialist agent. The reported 10x speed advantage and fraction-of-cost positioning suggest potential value for organizations building retrieval-augmented generation (RAG) systems.
However, the unavailability of the agent harness is a substantial gap. The technical report's performance claims cannot be independently verified until Chroma releases the harness and evaluation code. Teams considering adoption should plan for either waiting for the full release or building their own harness implementation based on the technical report's specifications.
Related Articles
Cohere releases 2B open-source speech model with 5.42% word error rate
Cohere has released Transcribe, a 2 billion parameter open-source automatic speech recognition model that the company claims tops the Hugging Face Open ASR Leaderboard with a 5.42% word error rate. The model supports 14 languages and is available under Apache 2.0 license, outperforming OpenAI's Whisper Large v3 and competing models on both accuracy and throughput metrics.
Mistral releases Voxtral TTS, open-source speech model for enterprise voice agents
Mistral AI released Voxtral TTS, an open-source text-to-speech model designed for enterprise voice agents and edge devices. The model supports nine languages, adapts custom voices from samples under five seconds, and achieves 90ms time-to-first-audio latency with a 6x real-time factor.
AI2 releases MolmoWeb, open web agent matching proprietary systems with 8B parameters
The Allen Institute for AI has released MolmoWeb, a fully open web agent that operates websites using only screenshots without access to source code. The 8B-parameter model achieves 78.2% success on WebVoyager—nearly matching OpenAI's o3 at 79.3%—while being trained on one of the largest public web task datasets ever released.
NVIDIA releases gpt-oss-puzzle-88B, 88B-parameter reasoning model with 1.63× throughput gains
NVIDIA released gpt-oss-puzzle-88B on March 26, 2026, a 88-billion parameter mixture-of-experts model optimized for inference efficiency on H100 hardware. Built using the Puzzle post-training neural architecture search framework, the model achieves 1.63× throughput improvement in long-context (64K/64K) scenarios and up to 2.82× improvement on single H100 GPUs compared to its parent gpt-oss-120B, while matching or exceeding accuracy across reasoning effort levels.
Comments
Loading...