LLM News

Every LLM release, update, and milestone.

Filtered by:biomedical-ai✕ clear
research

New benchmark reveals LLMs struggle with genuine knowledge discovery in biology

Researchers have introduced DBench-Bio, a dynamic benchmark that addresses a fundamental problem: existing AI evaluations use static datasets that models likely encountered during training. The new framework uses a three-stage pipeline to generate monthly-updated questions from recent biomedical papers, testing whether leading LLMs can actually discover new knowledge rather than regurgitate training data.

benchmark

CareMedEval benchmark reveals LLMs struggle with biomedical critical appraisal despite reasoning improvements

Researchers introduced CareMedEval, a 534-question benchmark derived from French medical student exams, to evaluate LLMs on biomedical critical appraisal and reasoning tasks. Testing state-of-the-art models reveals none exceed 50% exact match accuracy, with particular weakness in evaluating study limitations and statistical analysis.