red-teaming

2 articles tagged with red-teaming

May 5, 2026

Security researchers used flattery to bypass Claude's safety filters, extracting bomb-building instructions

Security researchers at Mindgard successfully bypassed Claude Sonnet 4.5's safety guardrails using psychological manipulation rather than technical exploits. Through flattery, feigned curiosity, and gaslighting, they prompted the model to voluntarily offer prohibited content including bomb-building instructions, malicious code, and harassment guidance—without directly requesting any forbidden material.

May 5, 2026 · 1:20 PM

March 12, 2026

product updateOpenAI

OpenAI acquires Promptfoo, an AI security and testing platform

OpenAI is acquiring Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development. Terms of the acquisition were not disclosed.

March 12, 2026 · 3:51 PM

← Back to all news