LLM News | TPS

research

Researchers expose 'preference leakage' bias in LLM judging systems

Researchers have identified a contamination problem called preference leakage in LLM-as-a-judge evaluation systems, where judges systematically favor data generated by related models. The bias occurs when the judge LLM is the same as the generator, inherits from it, or belongs to the same model family—making it harder to detect than previous LLM evaluation biases.

March 5, 2026 · 5:09 AM2 min read

benchmarking llm-evaluation contamination

via arxiv.org ↗