leaderboard
1 article tagged with leaderboard
April 21, 2026
benchmarkTiiuae
QIMMA Arabic Leaderboard Discards 3.1% of ArabicMMLU Samples After Quality Validation
TII UAE released QIMMA, an Arabic LLM leaderboard that validates benchmark quality before evaluating models. The validation pipeline, using Qwen3-235B and DeepSeek-V3 plus human review, discarded 3.1% of ArabicMMLU samples and found systematic quality issues across 14 benchmarks.