benchmarkAnthropic
FinRetrieval benchmark reveals Claude Opus achieves 90.8% accuracy on financial data retrieval with APIs
Researchers introduced FinRetrieval, a 500-question benchmark evaluating AI agents' ability to retrieve specific financial data from structured databases. Testing 14 configurations across Anthropic, OpenAI, and Google, the benchmark reveals Claude Opus achieves 90.8% accuracy with structured data APIs but only 19.8% with web search—a 71 percentage point performance gap that exceeds competitors by 3-4x.