model releaseMicrosoft

Microsoft open-sources Harrier embedding model with 27B parameters, 131K context window

TL;DR

Microsoft's Bing team has open-sourced Harrier, a 27-billion-parameter embedding model that supports over 100 languages and features a 131,072-token context window. The model ranks first on the MTEB v2 multilingual benchmark, outperforming proprietary offerings from OpenAI and Amazon, and is available on Hugging Face under the MIT license.

2 min read
0

Microsoft Open-Sources Harrier Embedding Model with 27B Parameters

Microsoft's Bing team has released Harrier, an open-source embedding model trained on over two billion examples augmented with synthetic data from GPT-5. The model is available in three sizes: a full 27-billion-parameter version, a 0.6-billion-parameter variant, and a 270-million-parameter lightweight option.

Key Specifications

The flagship Harrier-OSS-v1-27B model features:

  • Context window: 131,072 tokens (4x larger than comparable models)
  • Embedding dimension: 5,376
  • Active parameters: 25.6B of 27.0B total
  • Language support: 100+ languages
  • License: MIT (fully open-source)

The model was trained on synthetic data generated from GPT-5, according to Microsoft's team, though no independent verification of training methodology has been published.

Benchmark Performance

Harrier achieves a Borda score of 78% on the MTEB v2 multilingual benchmark, ranking it first overall. Microsoft claims this outperforms proprietary models from OpenAI (Gemini Embedding 001 scores 99% zero-shot accuracy but ranks 5th on Borda scoring) and Amazon, though direct head-to-head comparisons on identical benchmarks are not provided in available documentation.

Other top performers include KaLM-Embedding-Gemma3-12B (73% Borda), Llama-Embed-Nemotron-8B (7.0B params), and Qwen3-Embedding-8B (6.9B params).

Model Variants and Distribution

Smaller variants address different computational requirements:

  • Harrier-OSS-v1-0.6B: 0.44B active parameters, 32K context window, designed for edge deployment
  • 270M variant: Ultra-lightweight option for resource-constrained environments

All models are hosted on Hugging Face under MIT licensing, enabling commercial and research use without restrictions.

Intended Applications

Microsoft plans to integrate Harrier into Bing search and next-generation AI agent grounding services. The company describes embedding models as "increasingly critical" for multi-step agent tasks requiring information retrieval and organization.

What This Means

Harrier represents a strategic shift toward open-source tooling for enterprise AI infrastructure. By releasing a top-performing multilingual embedding model under permissive licensing, Microsoft reduces friction for developers building retrieval-augmented generation (RAG) systems and AI agents. The 131K context window positions Harrier above many commercial alternatives, addressing a specific gap in the market where context size matters for document-heavy retrieval tasks.

The release also signals competitive pressure in the embedding model space—historically dominated by closed APIs from OpenAI and Cohere. Open alternatives from Meta (Llama Embeddings) and now Microsoft may accelerate adoption of self-hosted embedding infrastructure among enterprises concerned with vendor lock-in or data residency.

Pricing advantage is significant: Harrier incurs only compute costs when self-hosted, versus per-API-call charges from proprietary services. However, independent verification of multilingual quality parity across all 100+ supported languages remains pending from third-party evaluation.

Related Articles

model release

Tencent releases OmniWeaving, open-source video generation model with reasoning and multi-modal composition

Tencent's Hunyuan team released OmniWeaving on April 3, 2026, an open-source video generation model designed to compete with proprietary systems like Seedance-2.0. The model combines multimodal composition, reasoning-informed capabilities, and supports eight video generation tasks including text-to-video, image-to-video, video editing, and compositional generation.

model release

NVIDIA releases Gemma 4 31B quantized model with 256K context, multimodal capabilities

NVIDIA has released a quantized version of Google DeepMind's Gemma 4 31B IT model, compressed to NVFP4 format for efficient inference on consumer GPUs. The 30.7B-parameter multimodal model supports 256K token context windows, handles text and image inputs with video frame processing, and maintains near-baseline performance across reasoning and coding benchmarks.

model release

Google DeepMind releases Gemma 4 with multimodal reasoning and up to 256K context window

Google DeepMind released Gemma 4, a multimodal model family supporting text, images, video, and audio with context windows up to 256K tokens. The release includes four sizes (E2B, E4B, 26B A4B, and 31B) designed for deployment from mobile devices to servers. The 31B dense model achieves 85.2% on MMLU Pro and 89.2% on AIME 2026.

model release

Google DeepMind releases Gemma 4, open multimodal models with 256K context and reasoning

Google DeepMind has released Gemma 4, a family of open-weights multimodal models ranging from 2.3B to 31B parameters with support for text, images, video, and audio. The models feature context windows up to 256K tokens, built-in reasoning modes, and native function calling for agentic workflows.

Comments

Loading...