LLM News

Every LLM release, update, and milestone.

Filtered by:multilingual-nlp✕ clear
research

UniLID: New language identification method achieves 70% accuracy with just 5 samples per language

Researchers introduce UniLID, a language identification method that leverages tokenizer-based unigram distributions to identify languages and dialects with high sample efficiency. The approach achieves over 70% accuracy on low-resource languages with only five labeled examples per language, substantially outperforming existing systems like fastText, GlotLID, and CLD3 in low-resource settings.