benchmark-improvements
1 article tagged with benchmark-improvements
April 6, 2026
research
Alibaba's HopChain framework fixes vision model failures in multi-step reasoning tasks
Researchers from Alibaba's Qwen team and Tsinghua University developed HopChain, a framework that automatically generates multi-step image questions to fix how vision-language models fail during complex reasoning tasks. The method improved 20 out of 24 tested benchmarks by forcing models to re-examine images at each reasoning step, preventing early perceptual errors from cascading through subsequent steps.