product updateAmazon Web Services

AWS launches AgentCore Code Interpreter to process documents beyond context window limits using recursive LLM architectu

TL;DR

Amazon Web Services released AgentCore Code Interpreter, a sandboxed Python environment that enables recursive language models to process documents of unlimited length by treating context as an external environment rather than loading it into the model's context window. The system orchestrates sub-LLM calls from within the sandbox, maintaining intermediate results as Python variables across a persistent session.

3 min read
0

AWS launches AgentCore Code Interpreter to process documents beyond context window limits using recursive LLM architecture

Amazon Web Services released AgentCore Code Interpreter, a sandboxed Python runtime that implements recursive language models (RLMs) to analyze documents of unlimited length without context window constraints.

How it works

The system treats input documents as an external environment rather than loading them directly into a model's context window. A root LLM agent writes Python code to search and slice documents iteratively, delegating semantic analysis to sub-LLM calls that keep results in working memory as Python variables.

The architecture has three components:

  1. A root LLM agent built with the Strands Agents SDK that receives queries and generates code
  2. An AgentCore Code Interpreter session running in PUBLIC network mode with the full document loaded as a Python variable
  3. An llm_query() function injected into the sandbox that calls Amazon Bedrock directly, keeping sub-LLM results in Python variables instead of the root LLM's context window

The persistent session state accumulates variables and intermediate results across multiple code executions, providing working memory throughout the analysis.

Technical specifications

According to AWS, the system:

  • Supports documents of varying lengths with no upper bound on context size
  • Maintains persistent state across executions in a sandboxed Python 3.10+ environment
  • Requires IAM permissions for bedrock:InvokeModel, bedrock-agentcore:StartCodeInterpreterSession, bedrock-agentcore:InvokeCodeInterpreter, and bedrock-agentcore:StopCodeInterpreterSession
  • Sets maximum session timeout at 3,600 seconds (1 hour)
  • Uses PUBLIC network mode to enable outbound API calls to Amazon Bedrock

Evaluation results

AWS tested the system on the Financial Multi-Document QA subset of LongBench v2, a benchmark with 15 multiple-choice questions requiring analysis across multiple financial reports with context lengths up to approximately 2 million characters.

The company compared RLM against two baselines:

  • Base approach: Sending the full document directly to the model with a 200K token context window
  • Long Context approach: Using Claude's 1 million token context window

AWS measured success rate (percentage of questions processed without errors) and accuracy (percentage of correct answers). Specific numeric results were not disclosed in the announcement.

Implementation requirements

Developers need:

  • AWS account with access to Amazon Bedrock foundation models
  • Python 3.10 or later
  • AWS CLI configured with appropriate credentials
  • AgentCore Code Interpreter configured with PUBLIC network mode
  • Strands Agents SDK for orchestration

The implementation involves starting a Code Interpreter session, loading documents into the sandbox, defining the llm_query() helper function, and creating a Strands Agent with an execute_python tool.

What this means

This approach addresses the fundamental limitation of fixed context windows by changing the interaction model between LLMs and long documents. Instead of expanding context windows—which remain bounded and suffer from attention degradation in long inputs—the recursive architecture treats documents as queryable environments. The practical impact is that document length becomes decoupled from model limitations, enabling analysis of arbitrarily long financial reports, legal documents, or technical specifications without preprocessing or chunking strategies. However, the system adds complexity through code generation and orchestration overhead, and the actual performance gains depend on the quality of the root LLM's code generation and the effectiveness of its document exploration strategy.

Related Articles

product update

Google opens 'Gemini built in' program to third-party speaker manufacturers with turnkey reference designs

Google is expanding its 'Gemini built in' program to include speaker reference designs, allowing third-party manufacturers to build Gemini-powered smart speakers without lengthy development cycles. The program, which previously launched cameras through Walmart's Onn brand, now provides turnkey hardware solutions for both speakers and cameras.

product update

AWS Launches Amazon Bedrock AgentCore for Deploying Production AI Agents

AWS has launched Amazon Bedrock AgentCore, a serverless runtime environment for deploying production AI agents. Turkish fulfillment company OPLOG demonstrated the platform's capabilities by building three business intelligence agents using Anthropic's Claude Sonnet, achieving a 35% reduction in sales cycles and 98% reduction in manual research time.

product update

Perplexity upgrades Comet iOS browser with phone number actions, iPad sidebar polish, Finance Deep Dive tabs

Perplexity has released a major update to its Comet AI browser for iOS, adding eight new features including one-tap phone number actions, a redesigned iPad sidebar, and Finance Deep Dive analysis that opens as browser tabs. The update also fixes persistent bugs with recently closed tabs and deleted conversation threads.

product update

Replit launches self-serve Enterprise with instant SSO setup, unlimited seats, credit-based pricing

Replit today launched self-serve Enterprise, allowing organizations to purchase and configure an Enterprise account with SSO, SCIM, and RBAC in minutes without sales calls. The credit-based model provides unlimited seats, with all spending pooled across Replit Agent, Deployments, and Storage.

Comments

Loading...