benchmark
ObfusQAte framework reveals LLMs hallucinate when faced with obfuscated questions
Researchers have introduced ObfusQAte, a new benchmark framework designed to test large language model robustness on obfuscated factual questions. The framework reveals that leading LLMs exhibit significant failure rates and hallucination tendencies when presented with increasingly nuanced language variations.