Microsoft releases Harrier embedding models with 32K context window, achieving 74.3 on MTEB v2
Microsoft released the Harrier-OSS embedding model family, comprising three variants with 270M, 600M, and 27B parameters. The largest model achieves 74.3 on the Multilingual MTEB v2 benchmark. All models support 32,768 max tokens and multilingual inputs across 40+ languages.
Harrier-OSS v1 270M — Quick Specs
Microsoft Releases Harrier Embedding Models with State-of-the-Art Multilingual Performance
Microsoft has released Harrier-OSS, a family of multilingual text embedding models designed for retrieval, clustering, semantic similarity, classification, and reranking tasks. The open-source models are available on Hugging Face.
Model Specifications
The Harrier family includes three variants:
| Model | Parameters | Embedding Dimension | Max Context | MTEB v2 Score |
|---|---|---|---|---|
| harrier-oss-v1-270m | 270M | 640 | 32,768 tokens | 66.5 |
| harrier-oss-v1-0.6b | 600M | 1,024 | 32,768 tokens | 69.0 |
| harrier-oss-v1-27b | 27B | 5,376 | 32,768 tokens | 74.3 |
All models use decoder-only architectures with last-token pooling and L2 normalization to generate dense embeddings. The 270M and 600M variants employ knowledge distillation from larger embedding models during training.
Training and Capabilities
Microsoft trained all variants using contrastive learning on multilingual datasets covering diverse embedding tasks. The models support 40+ languages including English, Spanish, French, German, Chinese, Japanese, Arabic, and Hindi.
Key capabilities span:
- Dense passage retrieval
- Semantic similarity scoring
- Text clustering
- Bitext mining
- Zero-shot classification and reranking
Each model requires task-specific instructions appended to queries during inference—for example, "Instruct: Retrieve semantically similar text\nQuery: [user query]". Documents do not require instructions.
Technical Details
The models are compatible with both Sentence Transformers and native Hugging Face Transformers libraries. They use BF16 tensor precision and are serialized in Safetensors format. The 270M variant has a 0.3B parameter model size (Safetensors).
Microsoft notes that reproduced scores may differ slightly from reported benchmarks due to library version differences in PyTorch and Transformers.
Performance Claims
According to Microsoft, the Harrier models achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of the release date. The 27B model significantly outperforms the smaller variants: 74.3 vs. 69.0 and 66.5 respectively.
What This Means
Harrier fills a gap for production embedding models that handle long sequences (32K tokens) and multilingual content without reliance on proprietary APIs. The three-tier parameter design allows organizations to choose between efficiency (270M for edge deployment) and accuracy (27B for complex retrieval). The requirement for task-specific instructions during inference adds operational complexity but enables customization across different search and classification scenarios. Open-source availability means researchers can fine-tune variants for domain-specific embeddings without vendor lock-in.
Related Articles
Microsoft releases Harrier embedding models with 32K token context, tops multilingual benchmark
Microsoft has released Harrier-OSS-v1, a family of multilingual text embedding models trained with contrastive learning and knowledge distillation. The 0.6B parameter variant achieves a 69.0 score on the Multilingual MTEB v2 benchmark with support for 32,768 token context windows and 45+ languages.
Arcee AI releases Trinity Large Thinking, open-source reasoning model with 262K context window
Arcee AI has released Trinity Large Thinking, an open-source reasoning model featuring a 262,144 token context window. The model is priced at $0.25 per million input tokens and $0.90 per million output tokens, with free access available through OpenRouter for the first five days.
Microsoft expands Copilot Cowork with AI model critique feature and cross-model comparison
Microsoft is expanding Copilot Cowork availability and introducing a Critique function that enables one AI model to review another's output. The update also includes a new Researcher agent claiming best-in-class deep research performance, outperforming Perplexity by 7 points, and a Model Council feature for direct model comparison.
Microsoft Copilot Researcher adds multi-model features using GPT and Claude
Microsoft has enabled its Copilot Researcher tool to simultaneously leverage OpenAI's GPT and Anthropic's Claude through two new features: Critique, which uses GPT responses refined by Claude, and Model Council, which displays side-by-side outputs with agreement/disagreement analysis. Both features are rolling out in the Microsoft 365 Copilot Frontier early access program.
Comments
Loading...