hallucination

3 articles tagged with hallucination

April 11, 2026
benchmark

AI models guess instead of asking for help, ProactiveBench study shows

Researchers introduced ProactiveBench, a benchmark testing whether multimodal language models ask for help when visual information is missing. Out of 22 models tested—including GPT-4.1, GPT-5.2, and o4-mini—almost none proactively request clarification, instead hallucinating or refusing to respond. A reinforcement learning approach showed models can be trained to ask for help, improving performance from 17.5% to 37-38%, though significant gaps remain.

March 14, 2026
benchmarkxAI

Grok 4.20 trails GPT-5.4 and Gemini 3.1 but achieves record 78% non-hallucination rate

xAI's Grok 4.20 scores 48 on Artificial Analysis' Intelligence Index—6 points ahead of Grok 4 but trailing Gemini 3.1 Pro Preview and GPT-5.4 at 57. The model distinguishes itself with a 78% non-hallucination rate on the AA Omniscience test, the highest recorded across any model tested.

February 22, 2026
researchApple

Apple Intelligence generates stereotyped summaries across hundreds of millions of devices

Apple Intelligence, which automatically summarizes notifications and messages on hundreds of millions of devices, systematically generates stereotyped and hallucinated content according to an independent AI Forensics investigation. The analysis of over 10,000 AI-generated summaries reveals bias baked into the feature that pushes problematic assumptions to users unprompted.