Exgentic

1 article tagged with Exgentic

May 18, 2026
benchmark

IBM Research launches Open Agent Leaderboard, showing same models achieve different results based on agent architecture

IBM Research has launched the Open Agent Leaderboard, the first open benchmark that evaluates complete AI agent systems rather than just underlying models. The leaderboard reveals that agents using identical models can achieve significantly different success rates and costs depending on system architecture, with failed runs costing 20-54% more than successful ones.