LLM News | TPS

benchmark

New benchmark evaluates music reward models trained on text, lyrics, and audio

Researchers have released CMI-RewardBench, a comprehensive evaluation framework for music reward models that handle mixed text, lyrics, and audio inputs. The benchmark includes 110,000 pseudo-labeled samples and human-annotated data, along with publicly available reward models designed for fine-grained music generation alignment.

March 5, 2026 · 5:06 AM1 min read

benchmark music-generation reward-models

via arxiv.org ↗