chain-of-thought
3 articles tagged with chain-of-thought
Alibaba's Qwen team develops algorithm that doubles reasoning chain length in math problems
Alibaba's Qwen team has developed Future-KL Influenced Policy Optimization (FIPO), a training algorithm that assigns different weights to tokens based on their influence on subsequent reasoning steps, rather than treating all tokens equally. Testing on Qwen2.5-32B-Base showed reasoning chains double from ~4,000 to 10,000+ tokens, with AIME 2024 accuracy improving from 50% to 58%, outperforming Deepseek-R1-Zero-Math-32B (47%) and OpenAI's o1-mini (56%). The team plans to open-source the system.
DeepSeek releases R1 reasoning model with chain-of-thought capabilities
DeepSeek has released DeepSeek-R1, a text generation model featuring reasoning capabilities through chain-of-thought processing. The model was published January 20, 2025 and has accumulated over 830,000 downloads on Hugging Face.
Bytedance study: reasoning models know when to stop, but sampling methods force continued thinking
A new Bytedance study reveals that large reasoning models actually know when they've reached the correct answer, but common sampling methods prevent them from stopping. The models engage in unnecessary cross-checking and reformulation despite already solving problems correctly.