EvoTool optimizes LLM agent tool-use policies via evolutionary algorithms without gradients
Researchers propose EvoTool, a gradient-free evolutionary framework that optimizes tool-use policies in LLM agents by decomposing them into four modules and iteratively improving each through blame attribution and targeted mutation. The approach outperforms GPT-4.1 and Qwen3-8B baselines by over 5 percentage points across four benchmarks.
EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents
Researchers have proposed EvoTool, a gradient-free framework that optimizes tool-use policies in LLM agents through evolutionary algorithms and modular decomposition. The approach addresses a fundamental challenge in agent development: improving how language models decide which tools to use, when to use them, and how to coordinate multiple operations in long-horizon tasks.
Core Problem
LLM-based agents struggle with effective tool-use policy optimization due to two factors: delayed supervision (feedback arrives only after many steps) and credit assignment difficulty (determining which module caused failures in multi-step trajectories). Existing optimization methods either treat the entire policy as a monolithic unit—risking entangled behaviors—or optimize individual aspects without accounting for how errors propagate across modules.
Architecture and Mechanisms
EvoTool decomposes agent tool-use policies into four distinct modules:
- Planner: Decides what subtasks to perform
- Selector: Chooses which tools to use
- Caller: Executes selected tools with appropriate parameters
- Synthesizer: Integrates tool outputs into final responses
The framework operates through three novel mechanisms:
Trajectory-Grounded Blame Attribution uses diagnostic traces from agent execution to isolate failures to specific modules, rather than blaming the entire policy.
Feedback-Guided Targeted Mutation edits only the failing module via natural-language critique, improving efficiency by avoiding unnecessary modifications to working components.
Diversity-Aware Population Selection maintains a population of candidate solutions with complementary strengths, preventing premature convergence and ensuring the algorithm explores diverse solution spaces.
Benchmark Results
Across four benchmarks, EvoTool outperformed strong baselines by over 5 percentage points when tested on both GPT-4.1 and Qwen3-8B. The framework also demonstrated superior efficiency compared to baselines and showed strong transferability—policies optimized on one model or task transferred effectively to others.
The gradient-free evolutionary approach eliminates the need for backpropagation through agent trajectories, which is computationally expensive and technically challenging in long-horizon settings.
Implementation Status
The paper is under review at arXiv (2603.04900). Authors indicate code will be released upon paper acceptance, though a release date has not been announced.
What This Means
EvoTool represents a practical alternative to gradient-based optimization for agent policy improvement. By focusing on modular decomposition and targeted mutation, the approach addresses real pain points in agent development: understanding why agents fail and improving specific failure modes without destabilizing working components. The transferability results are particularly significant, suggesting policies learned on one LLM might generalize to others—reducing the need for repeated optimization across different models. This could accelerate deployment of tool-using agents in production systems.