honesty
1 article tagged with honesty
June 2, 2026
benchmarkAnthropic
Claude Opus 4.8 fails legal reasoning test despite improved honesty scores
Anthropic's Claude Opus 4.8 demonstrated better uncertainty handling than its predecessor in independent testing across coding, medical, and financial scenarios. However, the model exhibited a significant judgment error in a legal reasoning test involving travel insurance claims, according to results published by ZDNET.