model releaseTencent

Tencent Releases Hy-MT2 Translation Models: 1.8B, 7B, and 30B-A3B Support 33 Languages

TL;DR

Tencent released Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B (MoE) sizes. All models support translation among 33 languages and follow translation instructions in multiple languages. The 1.8B model can be compressed to 440MB using 1.25-bit AngelSlim quantization.

2 min read
0

Tencent Releases Hy-MT2 Translation Models: 1.8B, 7B, and 30B-A3B Support 33 Languages

Tencent has open-sourced Hy-MT2, a family of "fast-thinking" multilingual translation models designed for complex real-world scenarios. The release includes three model sizes: 1.8B, 7B, and 30B-A3B (Mixture of Experts), all supporting translation among 33 languages.

Model Specifications

All three Hy-MT2 models can follow translation instructions in multiple languages. For on-device deployment, Tencent's AngelSlim 1.25-bit extreme quantization reduces the 1.8B model's storage requirement to 440MB and increases inference speed by 1.5x.

The models are released with multiple quantization options:

  • Full precision models
  • FP8 quantized versions
  • GGUF format for llama.cpp
  • 2-bit GGUF quantization
  • 1.25-bit GGUF quantization (1.8B only)

Performance Claims

According to Tencent, multi-dimensional evaluations show the models deliver strong performance across general, real-world business, domain-specific, and instruction-following translation tasks. The company claims the 7B and 30B-A3B models outperform open-source models including DeepSeek-V4-Pro and Kimi K2.6 in fast-thinking mode. Tencent also claims the 1.8B model surpasses commercial APIs from Microsoft and Doubao.

Tencent recommends temperature 0.7, top_p 0.6, top_k 20, and repetition_penalty 1.05 for the 1.8B and 7B models. The 30B-A3B model uses temperature 0.7, top_p 1.0, top_k -1, and repetition_penalty 1.0.

Benchmark Release

Alongside the models, Tencent open-sourced IFMTBench, a new benchmark for evaluating translation instruction-following capabilities. The models support various translation scenarios including terminology-aware translation, style-specific translation, personalized translation, delimiter preservation, and structured data translation.

Deployment

The models are compatible with transformers (version 5.6.0+), vLLM, SGLang, and llama.cpp. Tencent notes the GGUF format depends on their STQ kernel, released in llama.cpp PR #22836.

Tencent is partnering with WMT26 for the Video Subtitle Translation Task and offering special awards for participants using Hy-MT models in the General Machine Translation Task.

What This Means

Tencent's release adds specialized translation models to the open-source ecosystem, addressing a specific use case often handled by general-purpose LLMs. The 1.8B model's 440MB footprint after extreme quantization makes it viable for mobile and edge deployment. However, the company's performance claims comparing against commercial APIs require independent verification. The 33-language support and instruction-following capabilities suggest these models could compete with translation-specific services, though real-world performance in production environments remains to be tested.

Related Articles

model release

Tencent Releases Hy-MT2: 1.8B Translation Model Compressed to 440MB With 1.25-Bit Quantization

Tencent has open-sourced Hy-MT2, a family of multilingual translation models available in 1.8B, 7B, and 30B-A3B parameter sizes. The models support translation across 33 languages and include extreme quantization down to 1.25-bit, reducing the 1.8B model to 440MB storage while increasing inference speed by 1.5x.

model release

Cohere Releases Command A+: 218B-Parameter MoE Model With 4-Bit Quantization Runs on Single B200 GPU

Cohere has released Command A+, an open-source sparse mixture-of-experts model with 218 billion total parameters and 25 billion active parameters. The model features W4A4 quantization allowing deployment on a single Nvidia B200 GPU, supports 128K input context, and includes built-in chain-of-thought reasoning with vision capabilities.

model release

Cohere Releases Command A+ Open Source Model with 25B Active Parameters, 128K Context

Cohere has released Command A+ as an open source model under Apache 2.0 license. The sparse mixture-of-experts architecture features 25 billion active parameters out of 218B total parameters, supports 128K input context length, and includes vision capabilities alongside tool use and reasoning features.

model release

Anthropic's Unreleased Claude Mythos Preview Finds 10,000+ Vulnerabilities in One Month

Anthropic's unreleased Claude Mythos Preview model has discovered more than 10,000 vulnerabilities across partner organizations in its first month of deployment through Project Glasswing. The company reports partners are finding bugs at 10x their previous rate, with Cloudflare discovering 2,000 bugs and Mozilla finding 271 Firefox vulnerabilities — 10x more than with previous Claude models.

Comments

Loading...