research

TSEmbed combines mixture-of-experts with LoRA to scale multimodal embeddings across conflicting tasks

Researchers propose TSEmbed, a multimodal embedding framework that combines Mixture-of-Experts (MoE) with Low-Rank Adaptation (LoRA) to handle task conflicts in universal embedding models. The approach introduces Expert-Aware Negative Sampling (EANS) to improve discriminative power and achieves state-of-the-art results on the Massive Multimodal Embedding Benchmark (MMEB).

2 min read

TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings

A new research paper presents TSEmbed, a framework designed to address a core limitation in multimodal large language models: the inability to effectively serve as universal embedding models when handling multiple conflicting tasks simultaneously.

The Core Problem

While MLLMs demonstrate strong reasoning capabilities, converting them into general-purpose embedding models faces a critical bottleneck. Different downstream tasks often have conflicting objectives—what optimizes performance on one task can degrade performance on another. This task conflict makes it difficult to build a single, universal embedding model that performs well across diverse applications.

TSEmbed's Approach

The framework addresses this through three main components:

1. Mixture-of-Experts with LoRA: TSEmbed synergizes MoE architecture with Low-Rank Adaptation (LoRA) to explicitly disentangle conflicting task objectives. MoE allows different experts to specialize in different tasks, while LoRA provides efficient parameter updates without requiring full model retraining.

2. Expert-Aware Negative Sampling (EANS): A novel training strategy that leverages expert routing distributions as a semantic similarity proxy. Rather than sampling random negatives, EANS dynamically prioritizes hard negatives—examples that are challenging but informative—by selecting negatives that share similar expert activation patterns with the query. This approach sharpens the model's ability to distinguish between similar examples and refines embedding boundaries.

3. Two-Stage Learning: To ensure training stability, the researchers implemented a two-stage paradigm. The first stage solidifies expert specialization, establishing clear task-specific routing patterns. The second stage then optimizes representations using EANS, building on this specialized foundation.

Results

TSEmbed achieves state-of-the-art performance on the Massive Multimodal Embedding Benchmark (MMEB) and demonstrates effectiveness on real-world industrial production datasets. The framework demonstrates that explicit task disentanglement through expert specialization enables better scaling of universal multimodal embeddings.

What This Means

This work addresses a genuine gap in embedding model design: most universal embedding models today either specialize in single domains or make task-agnostic tradeoffs that hurt performance. By using MoE to route different tasks to specialized experts while using EANS to improve training efficiency, TSEmbed offers a path toward embedding models that can genuinely handle diverse applications without degradation. The practical validation on industrial datasets suggests this isn't purely theoretical—the approach works in real deployment scenarios. For organizations building embedding-based retrieval or semantic search systems, this signals that task-specialized architectures may outperform generalist approaches when properly optimized.

TSEmbed: Multimodal Embeddings with MoE and LoRA | TPS