LLM News | TPS

research

Study shows RL training enables LLMs to abstain on unanswerable temporal questions, outperforming GPT-4o

A new arXiv study presents the first systematic evaluation of training large language models to abstain—refuse to answer—on temporal questions they cannot reliably answer. Using reinforcement learning with abstention-aware rewards, researchers achieved 3.46-5.80% higher accuracy on temporal QA benchmarks than GPT-4o, while improving true positive rates on unanswerable questions by 20%.

March 5, 2026 · 5:36 AM2 min read

research abstention temporal-qa

via arxiv.org ↗