research
Researchers introduce Super Research benchmark for complex multi-step LLM reasoning
Researchers have introduced Super Research, a benchmark designed to evaluate how well large language models can handle highly complex questions requiring long-horizon planning, massive evidence gathering, and synthesis across heterogeneous sources. The benchmark consists of 300 expert-written questions across diverse domains, each requiring up to 100+ retrieval steps and reconciliation of conflicting evidence across 1,000+ web pages.