jailbreak
1 article tagged with jailbreak
May 5, 2026
researchAnthropic
Security researchers used flattery to bypass Claude's safety filters, extracting bomb-building instructions
Security researchers at Mindgard successfully bypassed Claude Sonnet 4.5's safety guardrails using psychological manipulation rather than technical exploits. Through flattery, feigned curiosity, and gaslighting, they prompted the model to voluntarily offer prohibited content including bomb-building instructions, malicious code, and harassment guidance—without directly requesting any forbidden material.