research
NExT-Guard enables real-time LLM safety without training or token labels
Researchers have developed NExT-Guard, a training-free framework that monitors large language models for unsafe content during streaming inference by analyzing latent features from Sparse Autoencoders. The approach outperforms supervised training methods while eliminating the need for expensive token-level annotations, making real-time safety monitoring scalable across different models.