Amazon Bedrock adds programmatic tool calling to reduce latency and token usage in multi-step workflows
Amazon Bedrock now supports programmatic tool calling (PTC), a technique that allows LLMs to generate Python code for multi-step tool orchestration rather than making sequential API calls. AWS offers three implementation paths: self-hosted Docker sandboxes on ECS, managed execution via Amazon Bedrock AgentCore Code Interpreter, and Anthropic SDK-compatible proxy integration.
Amazon Bedrock adds programmatic tool calling to reduce latency and token usage in multi-step workflows
Amazon Web Services has introduced programmatic tool calling (PTC) for Amazon Bedrock, enabling language models to generate Python code that orchestrates multiple tool invocations in a single inference cycle rather than requiring sequential round trips.
How it works
Traditional tool calling requires models to invoke tools one at a time, with each call requiring a full inference round trip. For a query like "Which engineering team members exceeded their Q3 travel budget?", a model using standard tool calling would need to make 20+ separate API calls to retrieve team members and expense records, passing thousands of intermediate data points through its context window.
With PTC, the model generates Python code once that handles all tool calls, data processing, and filtering within a sandboxed execution environment. Using asyncio.gather(), the code can execute tool calls in parallel. Only the final processed result returns to the model's context.
According to AWS, this approach reduces both latency and token consumption for workflows involving multiple tool calls, data aggregation, or numerical calculations.
Three implementation options
AWS provides three ways to implement PTC on Bedrock:
Self-hosted Docker sandbox on Amazon ECS: Offers maximum control with model-agnostic support for Claude, Qwen, MiniMax, Llama, Nova, and other Bedrock models. Developers can customize the sandbox environment, install domain-specific Python packages, and keep code execution within their AWS account. The architecture uses an orchestrator (ECS task or Lambda) that calls the InvokeModel API via Boto3 and manages Docker sandbox lifecycle.
Managed solution via Amazon Bedrock AgentCore Code Interpreter: A fully managed sandbox environment that handles execution without requiring custom infrastructure.
Anthropic SDK-compatible proxy: Designed for teams already using Anthropic's SDK who want PTC functionality while maintaining their existing developer workflow.
System prompt engineering
The self-hosted implementation relies on injecting tool definitions into the system prompt rather than using the standard tool_config parameter. The prompt instructs models to write Python code with specific rules: each execute_code call runs in a fresh stateless environment, tool calls must use await, and all operations must complete in a single code block.
The orchestrator intercepts tool calls through IPC over stdin/stderr, executes them externally, and injects results back into the sandbox.
What this means
PTC addresses a legitimate bottleneck in agentic workflows where sequential tool calling creates compounding latency. The model-agnostic self-hosted option is particularly significant—it extends a pattern originally introduced by specific providers to any model available on Bedrock. This matters for enterprises already committed to AWS infrastructure who want to avoid vendor lock-in at the model level.
The technique works best for workflows involving data aggregation, filtering operations, or scenarios where intermediate data shouldn't enter the model's context for privacy reasons. However, it requires models capable of generating correct async Python code and adds complexity around sandbox security and resource management.
Related Articles
Google launches Antigravity 2.0 with desktop app, Go-based CLI, and SDK at $100/month
Google announced Antigravity 2.0 at I/O 2026, transforming its coding tool into a full developer platform with a revamped desktop app supporting multi-agent orchestration, a new Go-based CLI, and an SDK for custom agents. The company introduced a $100/month AI Ultra tier and confirmed Gemini CLI will shut down for consumers on June 18, 2026.
Google launches Universal Cart, an AI agent that shops across multiple retailers in one checkout
Google announced Universal Cart at its I/O developer conference, an AI-powered shopping system that consolidates purchases from multiple retailers including Target, Shopify, Wayfair, and Etsy into a single checkout. The feature uses Gemini's agentic AI to verify product compatibility, suggest better deals, and automate routine purchases.
llm-gemini Plugin Adds Support for Google's Gemini 3.5 Flash Model
Developer Simon Willison released version 0.32 of the llm-gemini plugin, which adds support for Google's Gemini 3.5 Flash model. The plugin enables command-line access to Google's Gemini model family through the LLM tool.
AWS releases four multimodal evaluators for image-to-text AI tasks in Strands Evals SDK
AWS has added four multimodal evaluators to its Strands Evals SDK that judge image-to-text AI outputs by directly analyzing source images. The evaluators—Overall Quality, Correctness, Faithfulness, and Instruction Following—use multimodal large language models to detect visual hallucinations, factual errors, and instruction violations that text-only judges miss.
Comments
Loading...