research

EvoTool optimizes LLM agent tool-use policies via evolutionary algorithms without gradients

Researchers propose EvoTool, a gradient-free evolutionary framework that optimizes tool-use policies in LLM agents by decomposing them into four modules and iteratively improving each through blame attribution and targeted mutation. The approach outperforms GPT-4.1 and Qwen3-8B baselines by over 5 percentage points across four benchmarks.

March 6, 2026 · 6:07 AM2 min read

EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents

Researchers have proposed EvoTool, a gradient-free framework that optimizes tool-use policies in LLM agents through evolutionary algorithms and modular decomposition. The approach addresses a fundamental challenge in agent development: improving how language models decide which tools to use, when to use them, and how to coordinate multiple operations in long-horizon tasks.

Core Problem

LLM-based agents struggle with effective tool-use policy optimization due to two factors: delayed supervision (feedback arrives only after many steps) and credit assignment difficulty (determining which module caused failures in multi-step trajectories). Existing optimization methods either treat the entire policy as a monolithic unit—risking entangled behaviors—or optimize individual aspects without accounting for how errors propagate across modules.

Architecture and Mechanisms

EvoTool decomposes agent tool-use policies into four distinct modules:

Planner: Decides what subtasks to perform
Selector: Chooses which tools to use
Caller: Executes selected tools with appropriate parameters
Synthesizer: Integrates tool outputs into final responses

The framework operates through three novel mechanisms:

Trajectory-Grounded Blame Attribution uses diagnostic traces from agent execution to isolate failures to specific modules, rather than blaming the entire policy.

Feedback-Guided Targeted Mutation edits only the failing module via natural-language critique, improving efficiency by avoiding unnecessary modifications to working components.

Diversity-Aware Population Selection maintains a population of candidate solutions with complementary strengths, preventing premature convergence and ensuring the algorithm explores diverse solution spaces.

Benchmark Results

Across four benchmarks, EvoTool outperformed strong baselines by over 5 percentage points when tested on both GPT-4.1 and Qwen3-8B. The framework also demonstrated superior efficiency compared to baselines and showed strong transferability—policies optimized on one model or task transferred effectively to others.

The gradient-free evolutionary approach eliminates the need for backpropagation through agent trajectories, which is computationally expensive and technically challenging in long-horizon settings.

Implementation Status

The paper is under review at arXiv (2603.04900). Authors indicate code will be released upon paper acceptance, though a release date has not been announced.

What This Means

EvoTool represents a practical alternative to gradient-based optimization for agent policy improvement. By focusing on modular decomposition and targeted mutation, the approach addresses real pain points in agent development: understanding why agents fail and improving specific failure modes without destabilizing working components. The transferability results are particularly significant, suggesting policies learned on one LLM might generalize to others—reducing the need for repeated optimization across different models. This could accelerate deployment of tool-using agents in production systems.

Source: arxiv.org ↗

llm-agents tool-use policy-optimization evolutionary-algorithms gradient-free research credit-assignment