LLM News | TPS

research

NExT-Guard enables real-time LLM safety without training or token labels

Researchers have developed NExT-Guard, a training-free framework that monitors large language models for unsafe content during streaming inference by analyzing latent features from Sparse Autoencoders. The approach outperforms supervised training methods while eliminating the need for expensive token-level annotations, making real-time safety monitoring scalable across different models.

March 5, 2026 · 1:39 AM2 min read

safety streaming sparse-autoencoders

via arxiv.org ↗