model releaseMicrosoft

Microsoft releases Harrier embedding models with 32K context window, achieving 74.3 on MTEB v2

TL;DR

Microsoft released the Harrier-OSS embedding model family, comprising three variants with 270M, 600M, and 27B parameters. The largest model achieves 74.3 on the Multilingual MTEB v2 benchmark. All models support 32,768 max tokens and multilingual inputs across 40+ languages.

2 min read
0

Microsoft Releases Harrier Embedding Models with State-of-the-Art Multilingual Performance

Microsoft has released Harrier-OSS, a family of multilingual text embedding models designed for retrieval, clustering, semantic similarity, classification, and reranking tasks. The open-source models are available on Hugging Face.

Model Specifications

The Harrier family includes three variants:

Model Parameters Embedding Dimension Max Context MTEB v2 Score
harrier-oss-v1-270m 270M 640 32,768 tokens 66.5
harrier-oss-v1-0.6b 600M 1,024 32,768 tokens 69.0
harrier-oss-v1-27b 27B 5,376 32,768 tokens 74.3

All models use decoder-only architectures with last-token pooling and L2 normalization to generate dense embeddings. The 270M and 600M variants employ knowledge distillation from larger embedding models during training.

Training and Capabilities

Microsoft trained all variants using contrastive learning on multilingual datasets covering diverse embedding tasks. The models support 40+ languages including English, Spanish, French, German, Chinese, Japanese, Arabic, and Hindi.

Key capabilities span:

  • Dense passage retrieval
  • Semantic similarity scoring
  • Text clustering
  • Bitext mining
  • Zero-shot classification and reranking

Each model requires task-specific instructions appended to queries during inference—for example, "Instruct: Retrieve semantically similar text\nQuery: [user query]". Documents do not require instructions.

Technical Details

The models are compatible with both Sentence Transformers and native Hugging Face Transformers libraries. They use BF16 tensor precision and are serialized in Safetensors format. The 270M variant has a 0.3B parameter model size (Safetensors).

Microsoft notes that reproduced scores may differ slightly from reported benchmarks due to library version differences in PyTorch and Transformers.

Performance Claims

According to Microsoft, the Harrier models achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of the release date. The 27B model significantly outperforms the smaller variants: 74.3 vs. 69.0 and 66.5 respectively.

What This Means

Harrier fills a gap for production embedding models that handle long sequences (32K tokens) and multilingual content without reliance on proprietary APIs. The three-tier parameter design allows organizations to choose between efficiency (270M for edge deployment) and accuracy (27B for complex retrieval). The requirement for task-specific instructions during inference adds operational complexity but enables customization across different search and classification scenarios. Open-source availability means researchers can fine-tune variants for domain-specific embeddings without vendor lock-in.

Related Articles

model release

IBM Releases 97M-Parameter Granite Embedding Model With 60.3 MTEB Score — Highest Retrieval Quality Under 100M Parameter

IBM released two new multilingual embedding models under Apache 2.0: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval (highest in its size class) and a 311M full-size model scoring 65.2. Both support 200+ languages with enhanced retrieval for 52 languages, handle 32K-token context (64x increase over predecessors), and include code retrieval across 9 programming languages.

model release

Microsoft Releases Fara-7B: 7B Parameter Computer Use Agent Trained in 2.5 Days on 64 H100s

Microsoft Research has released Fara-7B, a 7-billion parameter small language model designed for computer automation tasks. The model, which took 2.5 days to train on 64 H100 GPUs, can navigate websites to complete tasks like booking restaurants and shopping, using screenshots as input with a 128K token context window.

product update

Microsoft Cancels Claude Code Licenses, Pushes Developers to GitHub Copilot CLI

Microsoft is removing Claude Code access from its Experiences + Devices division by June 30, 2026, redirecting thousands of engineers to GitHub Copilot CLI instead. The decision follows six months of Claude Code proving more popular than Microsoft's own coding tool among internal developers.

product update

Microsoft Edge mobile adds multi-tab summarization, podcast generation, and browsing history recall via Copilot

Microsoft Edge mobile version 148 and higher integrates six AI-powered features from its desktop version, including the ability to summarize multiple tabs simultaneously, generate podcasts from web pages, and recall browsing history for continued conversations. The update also adds a Journeys feature that tracks research topics and a Study and Learn mode for interactive quizzes.

Comments

Loading...