LLM News | TPS

research

Code agents can evolve math problems into harder variants, study finds

A new study demonstrates that code agents can autonomously evolve existing math problems into more complex, solvable variations through systematic exploration. The multi-agent framework addresses a critical bottleneck in training advanced LLMs toward IMO-level mathematical reasoning by providing a scalable mechanism for synthesizing high-difficulty problems.

March 5, 2026 · 1:38 AM2 min read

research code-agents mathematics

via arxiv.org ↗

research

New benchmark reveals code agents struggle to understand software architecture

A new research benchmark called Theory of Code Space (ToCS) exposes a critical limitation in AI code agents: they cannot reliably build and maintain understanding of software architecture during codebase exploration. The benchmark places agents in procedurally generated Python projects with partial observability, revealing that even frontier LLM agents score poorly at discovering module dependencies and cross-cutting invariants.

March 5, 2026 · 12:50 AM2 min read

code-agents software-architecture benchmark

via arxiv.org ↗