AWS launches AgentCore Code Interpreter to process documents beyond context window limits using recursive LLM architectu

TL;DR

Amazon Web Services released AgentCore Code Interpreter, a sandboxed Python environment that enables recursive language models to process documents of unlimited length by treating context as an external environment rather than loading it into the model's context window. The system orchestrates sub-LLM calls from within the sandbox, maintaining intermediate results as Python variables across a persistent session.

May 21, 2026 · 4:21 PM3 min read

AWS launches AgentCore Code Interpreter to process documents beyond context window limits using recursive LLM architecture

Amazon Web Services released AgentCore Code Interpreter, a sandboxed Python runtime that implements recursive language models (RLMs) to analyze documents of unlimited length without context window constraints.

How it works

The system treats input documents as an external environment rather than loading them directly into a model's context window. A root LLM agent writes Python code to search and slice documents iteratively, delegating semantic analysis to sub-LLM calls that keep results in working memory as Python variables.

The architecture has three components:

A root LLM agent built with the Strands Agents SDK that receives queries and generates code
An AgentCore Code Interpreter session running in PUBLIC network mode with the full document loaded as a Python variable
An llm_query() function injected into the sandbox that calls Amazon Bedrock directly, keeping sub-LLM results in Python variables instead of the root LLM's context window

The persistent session state accumulates variables and intermediate results across multiple code executions, providing working memory throughout the analysis.

Technical specifications

According to AWS, the system:

Supports documents of varying lengths with no upper bound on context size
Maintains persistent state across executions in a sandboxed Python 3.10+ environment
Requires IAM permissions for bedrock:InvokeModel, bedrock-agentcore:StartCodeInterpreterSession, bedrock-agentcore:InvokeCodeInterpreter, and bedrock-agentcore:StopCodeInterpreterSession
Sets maximum session timeout at 3,600 seconds (1 hour)
Uses PUBLIC network mode to enable outbound API calls to Amazon Bedrock

Evaluation results

AWS tested the system on the Financial Multi-Document QA subset of LongBench v2, a benchmark with 15 multiple-choice questions requiring analysis across multiple financial reports with context lengths up to approximately 2 million characters.

The company compared RLM against two baselines:

Base approach: Sending the full document directly to the model with a 200K token context window
Long Context approach: Using Claude's 1 million token context window

AWS measured success rate (percentage of questions processed without errors) and accuracy (percentage of correct answers). Specific numeric results were not disclosed in the announcement.

Implementation requirements

Developers need:

AWS account with access to Amazon Bedrock foundation models
Python 3.10 or later
AWS CLI configured with appropriate credentials
AgentCore Code Interpreter configured with PUBLIC network mode
Strands Agents SDK for orchestration

The implementation involves starting a Code Interpreter session, loading documents into the sandbox, defining the llm_query() helper function, and creating a Strands Agent with an execute_python tool.

What this means

This approach addresses the fundamental limitation of fixed context windows by changing the interaction model between LLMs and long documents. Instead of expanding context windows—which remain bounded and suffer from attention degradation in long inputs—the recursive architecture treats documents as queryable environments. The practical impact is that document length becomes decoupled from model limitations, enabling analysis of arbitrarily long financial reports, legal documents, or technical specifications without preprocessing or chunking strategies. However, the system adds complexity through code generation and orchestration overhead, and the actual performance gains depend on the quality of the root LLM's code generation and the effectiveness of its document exploration strategy.

Source: aws.amazon.com ↗

amazon-aws context-window code-interpreter recursive-llm document-analysis financial-qa longbench-v2 bedrock

product updateJuly 4, 2026

Google AI Plus at $4.99/month and AI Pro at $19.99/month expand Gemini context windows to 128K and 1M tokens

Google has detailed pricing and features for its Gemini app subscription tiers. AI Plus costs $4.99/month and includes 128,000 token context windows, while AI Pro at $19.99/month provides 1 million token context windows. Free users are limited to 32,000 tokens.

product updateJune 30, 2026

AWS enables fine-tuning of Amazon Nova models for email extraction, achieving 94.77% accuracy with 50% cost reduction

AWS released guidance on fine-tuning Amazon Nova Micro and Nova Lite models for automated email data extraction using SageMaker AI. In collaboration with Parcel Perform, the fine-tuned Nova Micro achieved 94.77% extraction accuracy—a 16.6 percentage point improvement—while reducing inference costs by 50% and latency by 30% compared to previous models.

product updateJuly 2, 2026

Anthropic launches Claude Science beta with NVIDIA BioNeMo integration for life sciences research

Anthropic has launched the public beta of Claude Science, an AI workbench for scientific research that integrates NVIDIA's BioNeMo Agent Toolkit. The platform allows scientists to execute end-to-end research workflows using natural language commands to interact with digital agents.

product updateJuly 1, 2026

Apple ships Safari MCP server in Technology Preview 247, enabling AI coding agents to inspect and debug websites

Apple has released an MCP server for Safari Technology Preview 247 that allows AI coding agents to directly inspect and debug websites. The server gives agents access to console logs, network requests, screenshots, and DOM interactions through the Model Context Protocol standard created by Anthropic.