model release

Chroma releases Context-1, a 20B parameter retrieval agent for complex multi-hop search

TL;DR

Chroma has released Context-1, a 20B parameter Mixture of Experts model trained specifically for retrieval tasks that require multi-hop reasoning. The model decomposes complex queries into subqueries, performs parallel tool calls, and actively prunes its own context mid-search—achieving comparable performance to frontier models at a fraction of the cost and up to 10x faster inference speed.

March 27, 2026 · 6:20 AM2 min read

Chroma Context-1 — Quick Specs

Compare Chroma Context-1 with other models →

Chroma Releases Context-1: A Specialized Retrieval Agent

Chroma has released Context-1, a 20B parameter agentic search model designed to serve as a retrieval subagent alongside frontier reasoning models. Unlike general-purpose LLMs, Context-1 is purpose-built for complex multi-hop retrieval tasks where a query requires iterative decomposition and selective document gathering.

Model Architecture and Training

Context-1 is built on the gpt-oss-20b base model as a Mixture of Experts architecture. The model was trained using supervised fine-tuning (SFT) combined with reinforcement learning through a curriculum-based approach (CISPO). Weights are available in BF16 precision, with an MXFP4 quantized checkpoint coming soon.

The model was trained on diverse domains including web search, legal documents, and financial data, enabling it to generalize across held-out domains and public benchmarks including BrowseComp-Plus, SealQA, FRAMES, and HLE.

Key Technical Capabilities

Query Decomposition: Context-1 breaks down complex, multi-constraint questions into targeted subqueries rather than attempting to answer them directly.

Parallel Tool Calling: The model averages 2.56 tool calls per turn, reducing the total number of search iterations and lowering end-to-end latency compared to sequential approaches.

Self-Editing Context: Perhaps the most distinctive feature is the model's ability to selectively prune irrelevant documents mid-search. Chroma reports a pruning accuracy of 0.94, allowing the model to maintain retrieval quality over long search horizons while operating within a bounded context window.

Cross-Domain Generalization: Training across multiple verticals enables the model to handle domains outside its training distribution.

Critical Limitation: Agent Harness Required

A significant caveat: Context-1 requires a specific agent harness to function as described in the technical report. This harness manages tool execution, token budgets, context pruning, and deduplication. Chroma has not yet publicly released the harness, meaning users cannot currently reproduce the reported performance metrics by running the model directly.

Chroma states the harness will be released "soon" and that its technical report describes the harness design in detail. The open-source community will need to wait for this release to fully evaluate the model's actual performance.

Pricing and Availability

The model is available under Apache 2.0 license on Hugging Face. No pricing information is available as this is an open-weight model intended for self-hosting or integration into applications. No inference provider has deployed it yet.

What This Means

Context-1 represents a shift toward specialized, task-optimized models rather than general-purpose scaling. Rather than making frontier models handle every task, Chroma is arguing for delegating retrieval to a smaller, cheaper, faster specialist agent. The reported 10x speed advantage and fraction-of-cost positioning suggest potential value for organizations building retrieval-augmented generation (RAG) systems.

However, the unavailability of the agent harness is a substantial gap. The technical report's performance claims cannot be independently verified until Chroma releases the harness and evaluation code. Teams considering adoption should plan for either waiting for the full release or building their own harness implementation based on the technical report's specifications.

Source: huggingface.co ↗

retrieval chroma 20b-parameters moe agent multi-hop-reasoning rag open-source

model releaseMay 6, 2026

IBM Releases Granite Embedding 311M R2 With 32K Context, 200+ Language Support

IBM released Granite Embedding 311M Multilingual R2, a 311-million parameter dense embedding model with 32,768-token context length and support for 200+ languages. The model scores 64.0 on Multilingual MTEB Retrieval (18 tasks), an 11.8-point improvement over its predecessor, and ships with ONNX and OpenVINO models for production deployment.

model releaseMay 8, 2026

Allen Institute releases EMO, 14B parameter MoE model with selective 12.5% expert use

Allen Institute for AI released EMO, a 1B-active, 14B-total-parameter mixture-of-experts model trained on 1 trillion tokens. The model uses 8 active experts per token from a pool of 128 total experts, and can maintain near full-model performance while using just 12.5% of its experts for specific tasks.

model releaseMay 8, 2026

InclusionAI Releases Ring-2.6-1T: 1 Trillion Parameter Thinking Model with 63B Active Parameters

InclusionAI has released Ring-2.6-1T, a 1 trillion parameter-scale model with 63 billion active parameters and a 262,144-token context window. The model features adaptive reasoning modes and is designed for coding agents, tool use, and long-horizon task execution.

model releaseMay 7, 2026

Zyphra Releases ZAYA1-8B: 8.4B Parameter MoE Model with 760M Active Parameters Matches 80B+ Models on Math Benchmarks

Zyphra has released ZAYA1-8B, a mixture-of-experts language model with 760M active parameters and 8.4B total parameters. The model scores 89.1% on AIME 2026, competitive with models exceeding 100B parameters, while maintaining efficiency for on-device deployment.