LLM News

Every LLM release, update, and milestone.

Filtered by:software-architecture✕ clear
research

New benchmark reveals code agents struggle to understand software architecture

A new research benchmark called Theory of Code Space (ToCS) exposes a critical limitation in AI code agents: they cannot reliably build and maintain understanding of software architecture during codebase exploration. The benchmark places agents in procedurally generated Python projects with partial observability, revealing that even frontier LLM agents score poorly at discovering module dependencies and cross-cutting invariants.