Study reveals preference leakage bias when LLMs judge synthetically-trained models
A new arXiv paper identifies preference leakage, a fundamental contamination problem in LLM-based evaluation where language models used as judges systematically favor models trained on data they synthesized. The researchers confirm the bias occurs across multiple model families and benchmarks, making it harder to detect than previously known LLM judge biases.