LLM News | TPS

research

Researchers introduce Super Research benchmark for complex multi-step LLM reasoning

Researchers have introduced Super Research, a benchmark designed to evaluate how well large language models can handle highly complex questions requiring long-horizon planning, massive evidence gathering, and synthesis across heterogeneous sources. The benchmark consists of 300 expert-written questions across diverse domains, each requiring up to 100+ retrieval steps and reconciliation of conflicting evidence across 1,000+ web pages.

March 5, 2026 · 1:21 AM2 min read

benchmark research complex-reasoning

via arxiv.org ↗