Microsoft releases Harrier embedding models with 32K token context, tops multilingual benchmark
Microsoft has released Harrier-OSS-v1, a family of multilingual text embedding models trained with contrastive learning and knowledge distillation. The 0.6B parameter variant achieves a 69.0 score on the Multilingual MTEB v2 benchmark with support for 32,768 token context windows and 45+ languages.
Harrier OSS v1 0.6B — Quick Specs
Microsoft Releases Harrier Multilingual Embedding Models
Microsoft has released Harrier-OSS-v1, a family of decoder-only text embedding models designed for multilingual retrieval, clustering, and semantic similarity tasks. The models use last-token pooling with L2 normalization and support a 32,768 token context window across all variants.
Model Variants and Performance
Microsoft offers three model sizes:
| Model | Parameters | Embedding Dimension | MTEB v2 Score |
|---|---|---|---|
| harrier-oss-v1-270m | 270M | 640 | 66.5 |
| harrier-oss-v1-0.6b | 0.6B | 1,024 | 69.0 |
| harrier-oss-v1-27b | 27B | 5,376 | 74.3 |
All three variants achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of their release. The models are distributed in BF16 tensor format via Hugging Face.
Training and Architecture
The models use contrastive learning objectives trained on large-scale multilingual datasets covering diverse downstream tasks. The 270M and 0.6B variants additionally incorporate knowledge distillation from larger embedding models to improve performance at smaller scales.
The architecture relies on last-token pooling—the embedding of the final non-padding token serves as the sentence representation—followed by L2 normalization. This pooling strategy is handled automatically in the Sentence Transformers library.
Multilingual Support and Applications
Harrier models support 45+ languages including Arabic, Bulgarian, Czech, German, English, Spanish, French, Hebrew, Hindi, Japanese, Korean, Polish, Portuguese, Russian, Turkish, Ukrainian, Vietnamese, and Chinese, among others.
The models are designed for:
- Dense retrieval and semantic search
- Text clustering and classification
- Semantic similarity computation
- Bitext mining
- Reranking tasks
Instruction-Based Fine-Tuning
A key feature: all models require task-specific instructions at query time. Users must provide one-sentence task descriptions (e.g., "Given a web search query, retrieve relevant passages") to achieve optimal performance. Instructions are optional for document embeddings. This allows customization of embeddings for different scenarios through natural language prompts. Pre-configured prompts include web_search_query, sts_query, and bitext_query.
Implementation
The models integrate with both Sentence Transformers and standard Hugging Face Transformers libraries. The 0.6B variant processes inputs up to 32,768 tokens, making it suitable for long-document encoding tasks. Microsoft provides code examples for both libraries, including query-document ranking workflows.
What This Means
Microsoft positions Harrier as an open-source alternative to proprietary embedding APIs, targeting organizations needing multilingual support at production scale. The three-tier sizing strategy allows cost-sensitive deployments (270M) alongside higher-accuracy variants (27B) within a single family. The instruction-based approach trades ease-of-use for task-specific customization—users must engineer prompts rather than relying on general-purpose embeddings. Evaluation on MTEB v2 provides standardized comparison, though practical performance depends on downstream application specifics and instruction quality.
Related Articles
Chroma releases Context-1, a 20B parameter retrieval agent for complex multi-hop search
Chroma has released Context-1, a 20B parameter Mixture of Experts model trained specifically for retrieval tasks that require multi-hop reasoning. The model decomposes complex queries into subqueries, performs parallel tool calls, and actively prunes its own context mid-search—achieving comparable performance to frontier models at a fraction of the cost and up to 10x faster inference speed.
Microsoft expands Copilot Cowork with AI model critique feature and cross-model comparison
Microsoft is expanding Copilot Cowork availability and introducing a Critique function that enables one AI model to review another's output. The update also includes a new Researcher agent claiming best-in-class deep research performance, outperforming Perplexity by 7 points, and a Model Council feature for direct model comparison.
Microsoft Copilot Researcher adds multi-model features using GPT and Claude
Microsoft has enabled its Copilot Researcher tool to simultaneously leverage OpenAI's GPT and Anthropic's Claude through two new features: Critique, which uses GPT responses refined by Claude, and Model Council, which displays side-by-side outputs with agreement/disagreement analysis. Both features are rolling out in the Microsoft 365 Copilot Frontier early access program.
Cohere releases 2B open-source speech model with 5.42% word error rate
Cohere has released Transcribe, a 2 billion parameter open-source automatic speech recognition model that the company claims tops the Hugging Face Open ASR Leaderboard with a 5.42% word error rate. The model supports 14 languages and is available under Apache 2.0 license, outperforming OpenAI's Whisper Large v3 and competing models on both accuracy and throughput metrics.
Comments
Loading...