LLM News

Every LLM release, update, and milestone.

Filtered by:gemini✕ clear
product update

Google Search launches Canvas: AI-powered workspace for documents, dashboards, and code

Google has launched Canvas, a new feature in Search that transforms AI-assisted search into an interactive workspace. The tool allows US users to build documents, dashboards, and code prototypes directly within the search interface, marking a shift toward positioning search as a collaborative AI assistant rather than a query-answer platform.

2 min readvia the-decoder.com
benchmarkOpenAI

CounselBench reveals critical safety gaps in LLM mental health responses

CounselBench, a new expert-evaluated benchmark, tested GPT-4, LLaMA 3, Gemini, and other LLMs on 2,000 mental health patient questions rated by 100 clinicians. The study found LLMs frequently provide unauthorized medical advice, overgeneralize, and lack personalization—with models systematically overrating their own performance on safety dimensions.

2 min readvia arxiv.org
benchmark

CFE-Bench: New STEM reasoning benchmark reveals frontier models struggle with multi-step logic

Researchers introduced CFE-Bench (Classroom Final Exam), a multimodal benchmark using authentic university homework and exam problems across 20+ STEM domains to evaluate LLM reasoning capabilities. Gemini 3.1 Pro Preview achieved the highest score at 59.69% accuracy, while analysis revealed frontier models frequently fail to maintain correct intermediate states in multi-step solutions.

2 min readvia arxiv.org