model releaseMicrosoft

Microsoft releases Harrier embedding models with 32K token context, tops multilingual benchmark

TL;DR

Microsoft has released Harrier-OSS-v1, a family of multilingual text embedding models trained with contrastive learning and knowledge distillation. The 0.6B parameter variant achieves a 69.0 score on the Multilingual MTEB v2 benchmark with support for 32,768 token context windows and 45+ languages.

2 min read
0

Microsoft Releases Harrier Multilingual Embedding Models

Microsoft has released Harrier-OSS-v1, a family of decoder-only text embedding models designed for multilingual retrieval, clustering, and semantic similarity tasks. The models use last-token pooling with L2 normalization and support a 32,768 token context window across all variants.

Model Variants and Performance

Microsoft offers three model sizes:

Model Parameters Embedding Dimension MTEB v2 Score
harrier-oss-v1-270m 270M 640 66.5
harrier-oss-v1-0.6b 0.6B 1,024 69.0
harrier-oss-v1-27b 27B 5,376 74.3

All three variants achieve state-of-the-art results on the Multilingual MTEB v2 benchmark as of their release. The models are distributed in BF16 tensor format via Hugging Face.

Training and Architecture

The models use contrastive learning objectives trained on large-scale multilingual datasets covering diverse downstream tasks. The 270M and 0.6B variants additionally incorporate knowledge distillation from larger embedding models to improve performance at smaller scales.

The architecture relies on last-token pooling—the embedding of the final non-padding token serves as the sentence representation—followed by L2 normalization. This pooling strategy is handled automatically in the Sentence Transformers library.

Multilingual Support and Applications

Harrier models support 45+ languages including Arabic, Bulgarian, Czech, German, English, Spanish, French, Hebrew, Hindi, Japanese, Korean, Polish, Portuguese, Russian, Turkish, Ukrainian, Vietnamese, and Chinese, among others.

The models are designed for:

  • Dense retrieval and semantic search
  • Text clustering and classification
  • Semantic similarity computation
  • Bitext mining
  • Reranking tasks

Instruction-Based Fine-Tuning

A key feature: all models require task-specific instructions at query time. Users must provide one-sentence task descriptions (e.g., "Given a web search query, retrieve relevant passages") to achieve optimal performance. Instructions are optional for document embeddings. This allows customization of embeddings for different scenarios through natural language prompts. Pre-configured prompts include web_search_query, sts_query, and bitext_query.

Implementation

The models integrate with both Sentence Transformers and standard Hugging Face Transformers libraries. The 0.6B variant processes inputs up to 32,768 tokens, making it suitable for long-document encoding tasks. Microsoft provides code examples for both libraries, including query-document ranking workflows.

What This Means

Microsoft positions Harrier as an open-source alternative to proprietary embedding APIs, targeting organizations needing multilingual support at production scale. The three-tier sizing strategy allows cost-sensitive deployments (270M) alongside higher-accuracy variants (27B) within a single family. The instruction-based approach trades ease-of-use for task-specific customization—users must engineer prompts rather than relying on general-purpose embeddings. Evaluation on MTEB v2 provides standardized comparison, though practical performance depends on downstream application specifics and instruction quality.

Related Articles

model release

Chroma releases Context-1, a 20B parameter retrieval agent for complex multi-hop search

Chroma has released Context-1, a 20B parameter Mixture of Experts model trained specifically for retrieval tasks that require multi-hop reasoning. The model decomposes complex queries into subqueries, performs parallel tool calls, and actively prunes its own context mid-search—achieving comparable performance to frontier models at a fraction of the cost and up to 10x faster inference speed.

product update

Microsoft expands Copilot Cowork with AI model critique feature and cross-model comparison

Microsoft is expanding Copilot Cowork availability and introducing a Critique function that enables one AI model to review another's output. The update also includes a new Researcher agent claiming best-in-class deep research performance, outperforming Perplexity by 7 points, and a Model Council feature for direct model comparison.

product update

Microsoft Copilot Researcher adds multi-model features using GPT and Claude

Microsoft has enabled its Copilot Researcher tool to simultaneously leverage OpenAI's GPT and Anthropic's Claude through two new features: Critique, which uses GPT responses refined by Claude, and Model Council, which displays side-by-side outputs with agreement/disagreement analysis. Both features are rolling out in the Microsoft 365 Copilot Frontier early access program.

model release

Cohere releases 2B open-source speech model with 5.42% word error rate

Cohere has released Transcribe, a 2 billion parameter open-source automatic speech recognition model that the company claims tops the Hugging Face Open ASR Leaderboard with a 5.42% word error rate. The model supports 14 languages and is available under Apache 2.0 license, outperforming OpenAI's Whisper Large v3 and competing models on both accuracy and throughput metrics.

Comments

Loading...