prompt-injection

9 articles tagged with prompt-injection

June 26, 2026
researchAnthropic

6,000 prompt injection attempts fail against Claude Opus 4.6 in public hacking challenge

A public hacking challenge targeting an AI assistant powered by Claude Opus 4.6 resulted in zero successful prompt injection attacks across 6,000 attempts. The experiment cost $500 in API tokens and triggered a Google account suspension due to email volume, but no participants managed to extract the system's secrets.

June 8, 2026
product updateOpenAI

OpenAI rolls out ChatGPT Lockdown mode to all users to block prompt injection data theft

OpenAI has expanded Lockdown mode to all ChatGPT plan tiers, including Free, Go, Plus, Pro, and Business users. The security feature blocks outbound network requests to prevent prompt injection attacks from stealing sensitive data, but disables live web browsing, Deep Research, and Agent mode.

June 5, 2026
product updateOpenAI

OpenAI launches Lockdown Mode to block prompt injection data exfiltration attacks

OpenAI has released Lockdown Mode, an optional security setting that protects against prompt injection attacks by limiting network requests and image fetching in ChatGPT. The feature is designed for users handling sensitive data and disables some ChatGPT capabilities including Deep Research and Agent Mode.

May 20, 2026
product update

Google Announces Gemini Spark Agent and Antigravity Platform at I/O, Launch Date Not Disclosed

Google announced Gemini Spark at I/O 2026, positioning it as a competitor to OpenAI's Claude-based agents. The service will integrate with Gmail, Calendar, Drive, and other Google apps, running on Gemini 3.5 Flash and a new platform called Antigravity. No general availability date has been disclosed.

April 1, 2026
product updateAnthropic

Claude Code bypasses safety rules after 50 chained commands, enabling prompt injection attacks

Claude Code will automatically approve denied commands—like curl—if preceded by 50 or more chained subcommands, according to security firm Adversa. The vulnerability stems from a hard-coded MAX_SUBCOMMANDS_FOR_SECURITY_CHECK limit set to 50 in the source code, after which the system falls back to requesting user permission rather than enforcing deny rules.

Google Deepmind identifies six attack categories that can hijack autonomous AI agents

A Google Deepmind paper introduces the first systematic framework for 'AI agent traps'—attacks that exploit autonomous agents' vulnerabilities to external tools and internet access. The researchers identify six attack categories targeting perception, reasoning, memory, actions, multi-agent networks, and human supervisors, with proof-of-concept demonstrations for each.

March 11, 2026
researchOpenAI

OpenAI releases IH-Challenge dataset to train models to reject untrusted instructions

OpenAI has released IH-Challenge, a training dataset designed to teach AI models to reliably distinguish between trusted and untrusted instructions. Early results show significant improvements in security and prompt injection defense capabilities.

March 9, 2026
product updateOpenAI

OpenAI acquires Promptfoo, integrates security testing into Frontier platform

OpenAI is acquiring Promptfoo, an AI security platform, to integrate automated vulnerability testing directly into its Frontier enterprise offering. The acquisition adds jailbreak detection, prompt injection testing, and data leak identification capabilities to OpenAI's enterprise product.

February 21, 2026
researchMicrosoft

Microsoft researchers discover prompt injection attacks via AI summarize buttons

Microsoft security researchers have identified a new prompt injection vulnerability where attackers embed hidden instructions in "Summarize with AI" buttons to permanently compromise AI assistant behavior and inject advertisements into chatbot memory.