LLM News

Every LLM release, update, and milestone.

Filtered by:clinical-ai✕ clear

benchmarkOpenAI

CounselBench reveals critical safety gaps in LLM mental health responses

CounselBench, a new expert-evaluated benchmark, tested GPT-4, LLaMA 3, Gemini, and other LLMs on 2,000 mental health patient questions rated by 100 clinicians. The study found LLMs frequently provide unauthorized medical advice, overgeneralize, and lack personalization—with models systematically overrating their own performance on safety dimensions.

March 5, 2026 · 5:39 AM2 min read

benchmark mental-health safety

via arxiv.org ↗

research

MedXIAOHE: New medical vision-language model claims state-of-the-art performance on clinical benchmarks

Researchers have published MedXIAOHE, a medical multimodal foundation model designed for clinical applications. According to the authors, the model achieves state-of-the-art performance across diverse medical benchmarks and surpasses several closed-source multimodal systems on multiple capabilities.

March 5, 2026 · 12:51 AM2 min read

medical-ai vision-language-model multimodal

via arxiv.org ↗