LLM News

Every LLM release, update, and milestone.

Filtered by:claude✕ clear
benchmarkAnthropic

FinRetrieval benchmark reveals Claude Opus achieves 90.8% accuracy on financial data retrieval with APIs

Researchers introduced FinRetrieval, a 500-question benchmark evaluating AI agents' ability to retrieve specific financial data from structured databases. Testing 14 configurations across Anthropic, OpenAI, and Google, the benchmark reveals Claude Opus achieves 90.8% accuracy with structured data APIs but only 19.8% with web search—a 71 percentage point performance gap that exceeds competitors by 3-4x.

product updateAnthropic

Anthropic adds memory feature to free Claude plan with cross-platform import tool

Anthropic has expanded Claude's memory feature to free plan users and launched a dedicated import tool for transferring conversation history from competing chatbots like OpenAI's ChatGPT and Google's Gemini. The update aims to reduce friction for users switching to Claude by preserving their AI interaction context and preferences.

2 min readvia theverge.com