Microsoft releases Harrier embedding models with 32K token context, tops multilingual benchmark
Microsoft has released Harrier-OSS-v1, a family of multilingual text embedding models trained with contrastive learning and knowledge distillation. The 0.6B parameter variant achieves a 69.0 score on the Multilingual MTEB v2 benchmark with support for 32,768 token context windows and 45+ languages.
Harrier OSS v1 0.6B — Quick Specs
Microsoft Releases Harrier Multilingual Embedding Models
Microsoft has released Harrier-OSS-v1, a family of decoder-only text embedding models designed for multilingual retrieval, clustering, and semantic similarity tasks. The models use last-token pooling with L2 normalization and support a 32,768 token context window across all variants.
Model Variants and Performance
Microsoft offers three model sizes:
| Model | Parameters | Embedding Dimension | MTEB v2 Score |
|---|---|---|---|
| harrier-oss-v1-270m | 270M | 640 | 66.5 |
| harrier-oss-v1-0.6b | 0.6B | 1,024 | 69.0 |
| harrier-oss-v1-27b | 27B | 5,376 | 74.3 |
All three variants achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of their release. The models are distributed in BF16 tensor format via Hugging Face.
Training and Architecture
The models use contrastive learning objectives trained on large-scale multilingual datasets covering diverse downstream tasks. The 270M and 0.6B variants additionally incorporate knowledge distillation from larger embedding models to improve performance at smaller scales.
The architecture relies on last-token pooling—the embedding of the final non-padding token serves as the sentence representation—followed by L2 normalization. This pooling strategy is handled automatically in the Sentence Transformers library.
Multilingual Support and Applications
Harrier models support 45+ languages including Arabic, Bulgarian, Czech, German, English, Spanish, French, Hebrew, Hindi, Japanese, Korean, Polish, Portuguese, Russian, Turkish, Ukrainian, Vietnamese, and Chinese, among others.
The models are designed for:
- Dense retrieval and semantic search
- Text clustering and classification
- Semantic similarity computation
- Bitext mining
- Reranking tasks
Instruction-Based Fine-Tuning
A key feature: all models require task-specific instructions at query time. Users must provide one-sentence task descriptions (e.g., "Given a web search query, retrieve relevant passages") to achieve optimal performance. Instructions are optional for document embeddings. This allows customization of embeddings for different scenarios through natural language prompts. Pre-configured prompts include web_search_query, sts_query, and bitext_query.
Implementation
The models integrate with both Sentence Transformers and standard Hugging Face Transformers libraries. The 0.6B variant processes inputs up to 32,768 tokens, making it suitable for long-document encoding tasks. Microsoft provides code examples for both libraries, including query-document ranking workflows.
What This Means
Microsoft positions Harrier as an open-source alternative to proprietary embedding APIs, targeting organizations needing multilingual support at production scale. The three-tier sizing strategy allows cost-sensitive deployments (270M) alongside higher-accuracy variants (27B) within a single family. The instruction-based approach trades ease-of-use for task-specific customization—users must engineer prompts rather than relying on general-purpose embeddings. Evaluation on MTEB v2 provides standardized comparison, though practical performance depends on downstream application specifics and instruction quality.
Related Articles
IBM Releases 97M-Parameter Granite Embedding Model With 60.3 MTEB Score — Highest Retrieval Quality Under 100M Parameter
IBM released two new multilingual embedding models under Apache 2.0: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval (highest in its size class) and a 311M full-size model scoring 65.2. Both support 200+ languages with enhanced retrieval for 52 languages, handle 32K-token context (64x increase over predecessors), and include code retrieval across 9 programming languages.
Microsoft Cancels Claude Code Licenses, Pushes Developers to GitHub Copilot CLI
Microsoft is removing Claude Code access from its Experiences + Devices division by June 30, 2026, redirecting thousands of engineers to GitHub Copilot CLI instead. The decision follows six months of Claude Code proving more popular than Microsoft's own coding tool among internal developers.
Microsoft Edge mobile adds multi-tab summarization, podcast generation, and browsing history recall via Copilot
Microsoft Edge mobile version 148 and higher integrates six AI-powered features from its desktop version, including the ability to summarize multiple tabs simultaneously, generate podcasts from web pages, and recall browsing history for continued conversations. The update also adds a Journeys feature that tracks research topics and a Study and Learn mode for interactive quizzes.
Microsoft Edge adds Copilot feature to analyze content across all open browser tabs
Microsoft is updating Edge to let Copilot read and analyze content across all open browser tabs simultaneously. The update includes AI-generated podcasts from tabs, study mode with quizzes, and long-term conversation memory.
Comments
Loading...