research

Neural Paging System Reduces LLM Context Management Complexity from O(N²) to O(N·K²)

A new research paper introduces Neural Paging, a hierarchical architecture that optimizes how LLMs manage their limited context windows by learning semantic caching policies. The approach reduces asymptotic complexity for long-horizon reasoning from O(N²) to O(N·K²) under bounded context window size K, addressing a fundamental bottleneck in deploying universal agents with external memory.

March 5, 2026 · 1:39 AM2 min read

Context Window as Semantic Cache, Not Infinite Memory

Large Language Models augmented with external read-write memory are theoretically Turing-complete, enabling general-purpose agents. But theory diverges sharply from practice: the context window remains a costly, finite bottleneck. Existing systems treat it as unlimited semantic storage. It isn't.

Researchers now propose a solution: Neural Paging, a learned context management system that explicitly optimizes token retention based on predicted future utility.

The Problem: Quadratic Complexity Spiral

Long-horizon reasoning in current LLM systems scales poorly. As agents reason over extended sequences, maintaining full context requires O(N²) operations—the agent's memory footprint grows quadratically with task length. This makes complex, multi-step reasoning prohibitively expensive.

The core issue: which tokens matter most for future steps? Today's systems either keep everything (wasteful) or use heuristics like recency (suboptimal).

Neural Paging: A Differentiable Page Controller

The paper introduces a lightweight, trainable Page Controller that learns to approximate "Semantic Belady's Optimality"—a reference to classical memory management theory. Instead of guessing, the controller learns which tokens have highest future utility under explicit assumptions about access patterns.

The architecture decouples symbolic reasoning (what the agent thinks about) from resource management (what tokens to keep). This separation allows the paging policy to be optimized independently, then deployed across different reasoning tasks.

Theoretical Guarantees

The work provides formal analysis:

Complexity reduction: Under bounded context window size K, Neural Paging reduces asymptotic complexity from O(N²) to O(N·K²). For large task horizons N with fixed context K, this is substantial—a linear improvement in N.
Robustness bound (Theorem 4): The authors quantify how performance degrades when access patterns differ from training assumptions, measuring competitive-ratio degradation under policy-dependent access with bounded sensitivity.
Validation: Theoretical bounds were tested on synthetic paging traces, confirming guarantees hold and identifying remaining slack—suggesting learned policies could improve further with better training.

What This Means

Neural Paging addresses a fundamental limitation in deploying agentic AI systems. Current LLM agents hit a wall: longer tasks demand exponentially more compute due to context overhead. This work shows a path forward.

The practical implication: agents could tackle longer-horizon problems—research, complex planning, code generation—without proportional cost increases. The learned policy is lightweight and differentiable, making it trainable on any LLM architecture.

However, the work remains theoretical with validation on synthetic data. Real-world deployment would require testing on actual agent tasks and measuring wall-clock speedups, not just asymptotic bounds. The "significant slack" identified suggests the current learned policies aren't yet optimal—there's room for improvement before hitting the theoretical ceiling.

This is foundational work for scaling agentic AI beyond prototype systems.

Source: arxiv.org ↗

llm-research context-window memory-management turing-complete-agents semantic-caching computational-complexity learned-optimization