LLM News

Every LLM release, update, and milestone.

Filtered by:mathematics✕ clear
benchmark

New benchmark reveals LLMs struggle with graduate-level math and computational reasoning

Researchers have released CompMath-MCQ, a new benchmark dataset containing 1,500 originally authored graduate-level mathematics questions designed to test LLM performance on advanced topics. The dataset covers linear algebra, numerical optimization, vector calculus, probability, and Python-based scientific computing—areas largely absent from existing math benchmarks. Baseline testing with state-of-the-art LLMs indicates that advanced computational mathematical reasoning remains a significant challenge.

2 min readvia arxiv.org
research

Code agents can evolve math problems into harder variants, study finds

A new study demonstrates that code agents can autonomously evolve existing math problems into more complex, solvable variations through systematic exploration. The multi-agent framework addresses a critical bottleneck in training advanced LLMs toward IMO-level mathematical reasoning by providing a scalable mechanism for synthesizing high-difficulty problems.

2 min readvia arxiv.org