research
T2S-Bench benchmark reveals text-to-structure reasoning gap across 45 AI models
Researchers introduced T2S-Bench, a new benchmark with 1,800 samples across 6 scientific domains and 32 structural types, evaluating text-to-structure reasoning in 45 mainstream models. The benchmark reveals substantial capability gaps: average accuracy on multi-hop reasoning tasks is only 52.1%, while Structure-of-Thought (SoT) prompting alone yields +5.7% improvement on average across eight text-processing tasks.