Microsoft releases Harrier embedding models with 32K token context, tops multilingual benchmark
Microsoft has released Harrier-OSS-v1, a family of multilingual text embedding models trained with contrastive learning and knowledge distillation. The 0.6B parameter variant achieves a 69.0 score on the Multilingual MTEB v2 benchmark with support for 32,768 token context windows and 45+ languages.
Harrier OSS v1 0.6B — Quick Specs
Microsoft Releases Harrier Multilingual Embedding Models
Microsoft has released Harrier-OSS-v1, a family of decoder-only text embedding models designed for multilingual retrieval, clustering, and semantic similarity tasks. The models use last-token pooling with L2 normalization and support a 32,768 token context window across all variants.
Model Variants and Performance
Microsoft offers three model sizes:
| Model | Parameters | Embedding Dimension | MTEB v2 Score |
|---|---|---|---|
| harrier-oss-v1-270m | 270M | 640 | 66.5 |
| harrier-oss-v1-0.6b | 0.6B | 1,024 | 69.0 |
| harrier-oss-v1-27b | 27B | 5,376 | 74.3 |
All three variants achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of their release. The models are distributed in BF16 tensor format via Hugging Face.
Training and Architecture
The models use contrastive learning objectives trained on large-scale multilingual datasets covering diverse downstream tasks. The 270M and 0.6B variants additionally incorporate knowledge distillation from larger embedding models to improve performance at smaller scales.
The architecture relies on last-token pooling—the embedding of the final non-padding token serves as the sentence representation—followed by L2 normalization. This pooling strategy is handled automatically in the Sentence Transformers library.
Multilingual Support and Applications
Harrier models support 45+ languages including Arabic, Bulgarian, Czech, German, English, Spanish, French, Hebrew, Hindi, Japanese, Korean, Polish, Portuguese, Russian, Turkish, Ukrainian, Vietnamese, and Chinese, among others.
The models are designed for:
- Dense retrieval and semantic search
- Text clustering and classification
- Semantic similarity computation
- Bitext mining
- Reranking tasks
Instruction-Based Fine-Tuning
A key feature: all models require task-specific instructions at query time. Users must provide one-sentence task descriptions (e.g., "Given a web search query, retrieve relevant passages") to achieve optimal performance. Instructions are optional for document embeddings. This allows customization of embeddings for different scenarios through natural language prompts. Pre-configured prompts include web_search_query, sts_query, and bitext_query.
Implementation
The models integrate with both Sentence Transformers and standard Hugging Face Transformers libraries. The 0.6B variant processes inputs up to 32,768 tokens, making it suitable for long-document encoding tasks. Microsoft provides code examples for both libraries, including query-document ranking workflows.
What This Means
Microsoft positions Harrier as an open-source alternative to proprietary embedding APIs, targeting organizations needing multilingual support at production scale. The three-tier sizing strategy allows cost-sensitive deployments (270M) alongside higher-accuracy variants (27B) within a single family. The instruction-based approach trades ease-of-use for task-specific customization—users must engineer prompts rather than relying on general-purpose embeddings. Evaluation on MTEB v2 provides standardized comparison, though practical performance depends on downstream application specifics and instruction quality.
Related Articles
DeepSeek Releases V4 Models: 1M Context Window, 90% Less KV Cache Than V3
DeepSeek has released two new MoE models: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated). Both models support a one million token context window and use a hybrid attention architecture that requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2.
DeepSeek Releases V4-Pro with 1.6T Parameters, 1M Token Context at 27% Inference Cost of V3
DeepSeek has released two Mixture-of-Experts models: V4-Pro with 1.6 trillion parameters (49B activated) and V4-Flash with 284B parameters (13B activated), both supporting 1 million token context windows. V4-Pro requires only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at 1M token context, trained on over 32 trillion tokens.
China's Z.ai releases GLM-5.2, open-source model matching Claude and GPT-5.5 in cybersecurity tasks
Z.ai's GLM-5.2 performs on par with Claude Opus 4.8 and OpenAI's GPT-5.5 in cybersecurity benchmarks while costing roughly half as much to run. Security evaluations from Graphistry and Semgrep confirm the open-weight model's capabilities in vulnerability discovery and cyber investigation, raising concerns about accessibility of advanced hacking tools.
Anthropic's Fable 5 model expected to return next week after 15-day government shutdown
The Trump administration is close to allowing Anthropic to restore access to its Fable 5 model, which has been offline for 15 days due to national security concerns. Insiders expect restrictions could be lifted as soon as next week, though Pentagon and NSA approval is still required.
Comments
Loading...