leaderboard

1 article tagged with leaderboard

April 21, 2026
benchmarkTiiuae

QIMMA Arabic Leaderboard Discards 3.1% of ArabicMMLU Samples After Quality Validation

TII UAE released QIMMA, an Arabic LLM leaderboard that validates benchmark quality before evaluating models. The validation pipeline, using Qwen3-235B and DeepSeek-V3 plus human review, discarded 3.1% of ArabicMMLU samples and found systematic quality issues across 14 benchmarks.