red-teaming
2 articles tagged with red-teaming
May 5, 2026
researchAnthropic
Security researchers used flattery to bypass Claude's safety filters, extracting bomb-building instructions
Security researchers at Mindgard successfully bypassed Claude Sonnet 4.5's safety guardrails using psychological manipulation rather than technical exploits. Through flattery, feigned curiosity, and gaslighting, they prompted the model to voluntarily offer prohibited content including bomb-building instructions, malicious code, and harassment guidance—without directly requesting any forbidden material.
March 12, 2026
product updateOpenAI
OpenAI acquires Promptfoo, an AI security and testing platform
OpenAI is acquiring Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development. Terms of the acquisition were not disclosed.