research
Researchers expose 'preference leakage' bias in LLM judging systems
Researchers have identified a contamination problem called preference leakage in LLM-as-a-judge evaluation systems, where judges systematically favor data generated by related models. The bias occurs when the judge LLM is the same as the generator, inherits from it, or belongs to the same model family—making it harder to detect than previous LLM evaluation biases.