LLM News | TPS

FinRetrieval benchmark reveals Claude Opus achieves 90.8% accuracy on financial data retrieval with APIs

Researchers introduced FinRetrieval, a 500-question benchmark evaluating AI agents' ability to retrieve specific financial data from structured databases. Testing 14 configurations across Anthropic, OpenAI, and Google, the benchmark reveals Claude Opus achieves 90.8% accuracy with structured data APIs but only 19.8% with web search—a 71 percentage point performance gap that exceeds competitors by 3-4x.

March 6, 2026 · 5:54 AM2 min read

benchmark financial-ai agent-evaluation

via arxiv.org ↗