MPCEval benchmark reveals multi-party conversation generation lags on speaker consistency
Researchers introduce MPCEval, a specialized benchmark for evaluating multi-party conversation generation—a capability increasingly used in smart reply and collaborative AI assistants. The benchmark decomposes conversation quality into speaker modeling, content quality, and speaker-content consistency, revealing that current models struggle with participation balance and maintaining consistent speaker behavior across longer exchanges.