Tencent Releases Hy-MT2: 1.8B Translation Model Compressed to 440MB With 1.25-Bit Quantization
Tencent has open-sourced Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B parameter sizes. The models support translation across 33 languages and include extreme quantization down to 1.25-bit, reducing the 1.8B model to 440MB storage while increasing inference speed by 1.5x.
Tencent Releases Hy-MT2: 1.8B Translation Model Compressed to 440MB With 1.25-Bit Quantization
Tencent has open-sourced Hy-MT2, a family of multilingual translation models designed for complex real-world scenarios, released on May 21, 2025. The family includes three model sizes: 1.8B, 7B, and 30B-A3B (mixture-of-experts architecture).
Model Specifications
All three models support translation among 33 languages and can follow translation instructions in multiple languages. The 1.8B base model is available in multiple quantization formats:
- Standard FP8 quantization
- GGUF format for llama.cpp deployment
- 2-bit quantization
- 1.25-bit extreme quantization via AngelSlim
The 1.25-bit quantization reduces storage requirements to just 440MB and improves inference speed by 1.5x compared to the unquantized version, according to Tencent.
Performance Claims
Tencent claims the 7B and 30B-A3B models outperform open-source models including DeepSeek-V4-Pro and Kimi K2.6 in "fast-thinking mode." The company states the lightweight 1.8B model surpasses mainstream commercial translation APIs from Microsoft and Doubao (ByteDance's service) overall.
The models were evaluated across general translation, real-world business scenarios, domain-specific tasks, and instruction-following capabilities. Tencent has released IFMTBench, a new benchmark specifically for evaluating translation instruction-following performance.
Inference Configuration
For the 1.8B and 7B models, Tencent recommends:
- Temperature: 0.7
- Top-p: 0.6
- Top-k: 20
- Repetition penalty: 1.05
- Max tokens: 4096
The 30B-A3B model uses different parameters: top-p of 1.0, top-k of -1, and no repetition penalty.
Deployment and Training
The models support deployment via transformers (version 5.6.0+), vLLM, and SGLang. Tencent provides a complete training pipeline supporting both full-parameter fine-tuning and LoRA fine-tuning with DeepSpeed ZeRO configurations and LLaMA-Factory integration.
The company has also released Hy-MT2-Translator Skill for easier integration of the model series into translation workflows.
WMT26 Partnership
Tencent announced an official partnership with WMT26 (Workshop on Machine Translation) for the "Video Subtitle Translation Task." Participants using Hy-MT models in the general machine translation and video subtitle translation tasks are eligible for special awards sponsored by Hunyuan.
All models are available on HuggingFace and ModelScope. Pricing for API access has not been disclosed.
What This Means
The 1.25-bit quantization achieving 440MB storage is notable for on-device deployment scenarios where model size is a critical constraint. However, Tencent's performance claims require independent verification—benchmarks against commercial APIs like Microsoft Translator need reproducible methodology. The partnership with WMT26 suggests Tencent is positioning these models for academic credibility in addition to commercial deployment. The extreme quantization approach, if validated, could enable translation models on resource-constrained devices that previously couldn't run models of this capability level.
Related Articles
Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU
Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.
IBM Releases 97M-Parameter Granite Embedding Model With 60.3 MTEB Score — Highest Retrieval Quality Under 100M Parameter
IBM released two new multilingual embedding models under Apache 2.0: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval (highest in its size class) and a 311M full-size model scoring 65.2. Both support 200+ languages with enhanced retrieval for 52 languages, handle 32K-token context (64x increase over predecessors), and include code retrieval across 9 programming languages.
Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context
Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.
NVIDIA releases Nemotron-Labs-Diffusion-14B with tri-mode decoding achieving 3.3x speed-up on GB200
NVIDIA released Nemotron-Labs-Diffusion-14B, a 14-billion parameter language model that supports three decoding modes by switching attention patterns during inference. The model achieves 850 tokens per second on GB200 hardware at concurrency 1, representing a 3.3x speed-up over standard autoregressive decoding and outperforming Qwen3-8B-Eagle3 by 2.2x in self-speculation mode.
Comments
Loading...