LLM News

Every LLM release, update, and milestone.

Filtered by:benchmark-contamination✕ clear
research

Study reveals preference leakage bias when LLMs judge synthetically-trained models

A new arXiv paper identifies preference leakage, a fundamental contamination problem in LLM-based evaluation where language models used as judges systematically favor models trained on data they synthesized. The researchers confirm the bias occurs across multiple model families and benchmarks, making it harder to detect than previously known LLM judge biases.