research
Reasoning models fail at theory of mind tasks despite math excellence
A systematic study of nine advanced language models reveals that large reasoning models—designed to excel at step-by-step math and coding—actually underperform or match non-reasoning models on theory of mind tasks. The research identifies a critical weakness: longer reasoning chains actively harm social reasoning performance, suggesting current reasoning architectures don't transfer to socio-cognitive skills.