Microsoft releases Harrier embedding models with 32K token context, tops multilingual benchmark

TL;DR

Microsoft has released Harrier-OSS-v1, a family of multilingual text embedding models trained with contrastive learning and knowledge distillation. The 0.6B parameter variant achieves a 69.0 score on the Multilingual MTEB v2 benchmark with support for 32,768 token context windows and 45+ languages.

March 30, 2026 · 7:20 PM2 min read

Harrier OSS v1 0.6B — Quick Specs

Context window33K tokens

Compare Harrier OSS v1 0.6B with other models →

Microsoft Releases Harrier Multilingual Embedding Models

Microsoft has released Harrier-OSS-v1, a family of decoder-only text embedding models designed for multilingual retrieval, clustering, and semantic similarity tasks. The models use last-token pooling with L2 normalization and support a 32,768 token context window across all variants.

Model Variants and Performance

Microsoft offers three model sizes:

Model	Parameters	Embedding Dimension	MTEB v2 Score
harrier-oss-v1-270m	270M	640	66.5
harrier-oss-v1-0.6b	0.6B	1,024	69.0
harrier-oss-v1-27b	27B	5,376	74.3

All three variants achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of their release. The models are distributed in BF16 tensor format via Hugging Face.

Training and Architecture

The models use contrastive learning objectives trained on large-scale multilingual datasets covering diverse downstream tasks. The 270M and 0.6B variants additionally incorporate knowledge distillation from larger embedding models to improve performance at smaller scales.

The architecture relies on last-token pooling—the embedding of the final non-padding token serves as the sentence representation—followed by L2 normalization. This pooling strategy is handled automatically in the Sentence Transformers library.

Multilingual Support and Applications

Harrier models support 45+ languages including Arabic, Bulgarian, Czech, German, English, Spanish, French, Hebrew, Hindi, Japanese, Korean, Polish, Portuguese, Russian, Turkish, Ukrainian, Vietnamese, and Chinese, among others.

The models are designed for:

Dense retrieval and semantic search
Text clustering and classification
Semantic similarity computation
Bitext mining
Reranking tasks

Instruction-Based Fine-Tuning

A key feature: all models require task-specific instructions at query time. Users must provide one-sentence task descriptions (e.g., "Given a web search query, retrieve relevant passages") to achieve optimal performance. Instructions are optional for document embeddings. This allows customization of embeddings for different scenarios through natural language prompts. Pre-configured prompts include web_search_query, sts_query, and bitext_query.

Implementation

The models integrate with both Sentence Transformers and standard Hugging Face Transformers libraries. The 0.6B variant processes inputs up to 32,768 tokens, making it suitable for long-document encoding tasks. Microsoft provides code examples for both libraries, including query-document ranking workflows.

What This Means

Microsoft positions Harrier as an open-source alternative to proprietary embedding APIs, targeting organizations needing multilingual support at production scale. The three-tier sizing strategy allows cost-sensitive deployments (270M) alongside higher-accuracy variants (27B) within a single family. The instruction-based approach trades ease-of-use for task-specific customization—users must engineer prompts rather than relying on general-purpose embeddings. Evaluation on MTEB v2 provides standardized comparison, though practical performance depends on downstream application specifics and instruction quality.

Source: huggingface.co ↗

microsoft text-embeddings multilingual open-source mteb semantic-search retrieval

model releaseMay 14, 2026

IBM Releases 97M-Parameter Granite Embedding Model With 60.3 MTEB Score — Highest Retrieval Quality Under 100M Parameter

IBM released two new multilingual embedding models under Apache 2.0: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval (highest in its size class) and a 311M full-size model scoring 65.2. Both support 200+ languages with enhanced retrieval for 52 languages, handle 32K-token context (64x increase over predecessors), and include code retrieval across 9 programming languages.

product updateMay 14, 2026

Microsoft Cancels Claude Code Licenses, Pushes Developers to GitHub Copilot CLI

Microsoft is removing Claude Code access from its Experiences + Devices division by June 30, 2026, redirecting thousands of engineers to GitHub Copilot CLI instead. The decision follows six months of Claude Code proving more popular than Microsoft's own coding tool among internal developers.

product updateMay 14, 2026

Microsoft Edge mobile adds multi-tab summarization, podcast generation, and browsing history recall via Copilot

Microsoft Edge mobile version 148 and higher integrates six AI-powered features from its desktop version, including the ability to summarize multiple tabs simultaneously, generate podcasts from web pages, and recall browsing history for continued conversations. The update also adds a Journeys feature that tracks research topics and a Study and Learn mode for interactive quizzes.

product updateMay 13, 2026

Microsoft Edge adds Copilot feature to analyze content across all open browser tabs

Microsoft is updating Edge to let Copilot read and analyze content across all open browser tabs simultaneously. The update includes AI-generated podcasts from tabs, study mode with quizzes, and long-term conversation memory.