AWS launches AgentCore Code Interpreter to process documents beyond context window limits using recursive LLM architectu
Amazon Web Services released AgentCore Code Interpreter, a sandboxed Python environment that enables recursive language models to process documents of unlimited length by treating context as an external environment rather than loading it into the model's context window. The system orchestrates sub-LLM calls from within the sandbox, maintaining intermediate results as Python variables across a persistent session.
AWS launches AgentCore Code Interpreter to process documents beyond context window limits using recursive LLM architecture
Amazon Web Services released AgentCore Code Interpreter, a sandboxed Python runtime that implements recursive language models (RLMs) to analyze documents of unlimited length without context window constraints.
How it works
The system treats input documents as an external environment rather than loading them directly into a model's context window. A root LLM agent writes Python code to search and slice documents iteratively, delegating semantic analysis to sub-LLM calls that keep results in working memory as Python variables.
The architecture has three components:
- A root LLM agent built with the Strands Agents SDK that receives queries and generates code
- An AgentCore Code Interpreter session running in PUBLIC network mode with the full document loaded as a Python variable
- An
llm_query()function injected into the sandbox that calls Amazon Bedrock directly, keeping sub-LLM results in Python variables instead of the root LLM's context window
The persistent session state accumulates variables and intermediate results across multiple code executions, providing working memory throughout the analysis.
Technical specifications
According to AWS, the system:
- Supports documents of varying lengths with no upper bound on context size
- Maintains persistent state across executions in a sandboxed Python 3.10+ environment
- Requires IAM permissions for
bedrock:InvokeModel,bedrock-agentcore:StartCodeInterpreterSession,bedrock-agentcore:InvokeCodeInterpreter, andbedrock-agentcore:StopCodeInterpreterSession - Sets maximum session timeout at 3,600 seconds (1 hour)
- Uses PUBLIC network mode to enable outbound API calls to Amazon Bedrock
Evaluation results
AWS tested the system on the Financial Multi-Document QA subset of LongBench v2, a benchmark with 15 multiple-choice questions requiring analysis across multiple financial reports with context lengths up to approximately 2 million characters.
The company compared RLM against two baselines:
- Base approach: Sending the full document directly to the model with a 200K token context window
- Long Context approach: Using Claude's 1 million token context window
AWS measured success rate (percentage of questions processed without errors) and accuracy (percentage of correct answers). Specific numeric results were not disclosed in the announcement.
Implementation requirements
Developers need:
- AWS account with access to Amazon Bedrock foundation models
- Python 3.10 or later
- AWS CLI configured with appropriate credentials
- AgentCore Code Interpreter configured with PUBLIC network mode
- Strands Agents SDK for orchestration
The implementation involves starting a Code Interpreter session, loading documents into the sandbox, defining the llm_query() helper function, and creating a Strands Agent with an execute_python tool.
What this means
This approach addresses the fundamental limitation of fixed context windows by changing the interaction model between LLMs and long documents. Instead of expanding context windows—which remain bounded and suffer from attention degradation in long inputs—the recursive architecture treats documents as queryable environments. The practical impact is that document length becomes decoupled from model limitations, enabling analysis of arbitrarily long financial reports, legal documents, or technical specifications without preprocessing or chunking strategies. However, the system adds complexity through code generation and orchestration overhead, and the actual performance gains depend on the quality of the root LLM's code generation and the effectiveness of its document exploration strategy.
Related Articles
Google AI Plus at $4.99/month and AI Pro at $19.99/month expand Gemini context windows to 128K and 1M tokens
Google has detailed pricing and features for its Gemini app subscription tiers. AI Plus costs $4.99/month and includes 128,000 token context windows, while AI Pro at $19.99/month provides 1 million token context windows. Free users are limited to 32,000 tokens.
AWS enables fine-tuning of Amazon Nova models for email extraction, achieving 94.77% accuracy with 50% cost reduction
AWS released guidance on fine-tuning Amazon Nova Micro and Nova Lite models for automated email data extraction using SageMaker AI. In collaboration with Parcel Perform, the fine-tuned Nova Micro achieved 94.77% extraction accuracy—a 16.6 percentage point improvement—while reducing inference costs by 50% and latency by 30% compared to previous models.
Anthropic launches Claude Science beta with NVIDIA BioNeMo integration for life sciences research
Anthropic has launched the public beta of Claude Science, an AI workbench for scientific research that integrates NVIDIA's BioNeMo Agent Toolkit. The platform allows scientists to execute end-to-end research workflows using natural language commands to interact with digital agents.
Apple ships Safari MCP server in Technology Preview 247, enabling AI coding agents to inspect and debug websites
Apple has released an MCP server for Safari Technology Preview 247 that allows AI coding agents to directly inspect and debug websites. The server gives agents access to console logs, network requests, screenshots, and DOM interactions through the Model Context Protocol standard created by Anthropic.
Comments
Loading...