LLM News

Every LLM release, update, and milestone.

Filtered by:biomedical-ai✕ clear

research

New benchmark reveals LLMs struggle with genuine knowledge discovery in biology

Researchers have introduced DBench-Bio, a dynamic benchmark that addresses a fundamental problem: existing AI evaluations use static datasets that models likely encountered during training. The new framework uses a three-stage pipeline to generate monthly-updated questions from recent biomedical papers, testing whether leading LLMs can actually discover new knowledge rather than regurgitate training data.

March 5, 2026 · 6:07 AM2 min read

benchmark knowledge-discovery LLM-evaluation

via arxiv.org ↗

benchmark

CareMedEval benchmark reveals LLMs struggle with biomedical critical appraisal despite reasoning improvements

Researchers introduced CareMedEval, a 534-question benchmark derived from French medical student exams, to evaluate LLMs on biomedical critical appraisal and reasoning tasks. Testing state-of-the-art models reveals none exceed 50% exact match accuracy, with particular weakness in evaluating study limitations and statistical analysis.

March 5, 2026 · 5:07 AM2 min read

benchmark biomedical-ai llm-evaluation

via arxiv.org ↗