security

36 articles tagged with security

June 26, 2026

US government allows Anthropic to release Claude Mythos 5 to 100+ institutions after two-week export control block

The US Commerce Department has partially lifted export controls on Anthropic's Claude Mythos 5 model, permitting its release to over 100 US institutions including major companies and government agencies. The restrictions, imposed two weeks ago alongside a block on Claude Fable 5, reportedly stemmed from concerns about potential jailbreaks and Chinese access.

June 26, 2026 · 11:20 PM

researchAnthropic

6,000 prompt injection attempts fail against Claude Opus 4.6 in public hacking challenge

A public hacking challenge targeting an AI assistant powered by Claude Opus 4.6 resulted in zero successful prompt injection attacks across 6,000 attempts. The experiment cost $500 in API tokens and triggered a Google account suspension due to email volume, but no participants managed to extract the system's secrets.

June 26, 2026 · 6:50 PM

June 25, 2026

model releaseDeepSeek

DeepSeek-V4-Fable: Offensive Security Model Trained on 80,000 CTF Trajectories Achieves 58.7% Solve Rate

Chunjiang Intelligence has released DeepSeek-V4-Fable, an autonomous agent model designed for offensive security research and CTF challenges. The model, distilled from Claude-5-Fable and built on DeepSeek-V4-Flash, was trained on 80,000 verified CTF trajectories and achieves a 58.7% solve rate across held-out security challenges.

June 25, 2026 · 3:06 PM

June 22, 2026

product update

Trail of Bits and OpenAI's Daybreak initiative produce 64 pull requests across 19 open-source projects in one week using

Trail of Bits launched Patch the Planet, a security initiative using OpenAI's GPT-5.5-Cyber model to find and fix bugs in critical open-source projects. The first week produced 64 pull requests and 51 issues across 19 projects including cURL, Python, PyPI, and Sigstore, with 37 patches already merged.

June 22, 2026 · 5:06 PM

June 8, 2026

product updateOpenAI

OpenAI rolls out ChatGPT Lockdown mode to all users to block prompt injection data theft

OpenAI has expanded Lockdown mode to all ChatGPT plan tiers, including Free, Go, Plus, Pro, and Business users. The security feature blocks outbound network requests to prevent prompt injection attacks from stealing sensitive data, but disables live web browsing, Deep Research, and Agent mode.

June 8, 2026 · 2:50 PM

June 5, 2026

product updateOpenAI

OpenAI launches Lockdown Mode to block prompt injection data exfiltration attacks

OpenAI has released Lockdown Mode, an optional security setting that protects against prompt injection attacks by limiting network requests and image fetching in ChatGPT. The feature is designed for users handling sensitive data and disables some ChatGPT capabilities including Deep Research and Agent Mode.

June 5, 2026 · 7:50 PM

June 1, 2026

product updateAmazon Web Services

AWS adds Policy Engine and Lambda interceptors to Bedrock AgentCore gateway for agent security controls

Amazon Web Services launched Policy Engine and Lambda interceptors for Bedrock AgentCore gateway, enabling enterprises to control which tools AI agents can access and validate requests dynamically. The Policy Engine uses Cedar declarative policy language for deterministic access decisions, while Lambda interceptors run custom code before or after each tool call for validation, token exchange, and response filtering.

June 1, 2026 · 6:05 PM

May 20, 2026

product update

Google Announces Gemini Spark Agent and Antigravity Platform at I/O, Launch Date Not Disclosed

Google announced Gemini Spark at I/O 2026, positioning it as a competitor to OpenAI's Claude-based agents. The service will integrate with Gmail, Calendar, Drive, and other Google apps, running on Gemini 3.5 Flash and a new platform called Antigravity. No general availability date has been disclosed.

May 20, 2026 · 3:51 PM

May 19, 2026

product updateAnthropic

Anthropic adds MCP tunnels and self-hosted sandboxes to Claude Managed Agents for enterprise security

Anthropic has added two enterprise security features to Claude Managed Agents: MCP tunnels, which route agent services through private networks without public internet exposure, and self-hosted sandboxes, which keep sensitive tool execution within customer infrastructure while Anthropic handles orchestration.

May 19, 2026 · 3:36 PM

May 14, 2026

researchAnthropic

Security researchers use Anthropic's Mythos Preview to bypass Apple's M5 memory protection in 5 days

Security researchers at Calif used Anthropic's Mythos Preview model to develop a working macOS kernel memory corruption exploit on M5 silicon in five days, bypassing Apple's Memory Integrity Enforcement (MIE) system. The exploit chain targets macOS 26.4.1 and escalates from unprivileged local user to root shell using two vulnerabilities and several techniques.

May 14, 2026 · 10:35 PM

May 13, 2026

product updateOpenAI

OpenAI builds custom Windows sandbox for Codex coding agent after existing tools proved insufficient

OpenAI has implemented a custom sandbox for its Codex coding agent on Windows after determining that existing Windows isolation tools—AppContainer, Windows Sandbox, and Mandatory Integrity Control—could not adequately balance safety and functionality. The solution uses synthetic SIDs and write-restricted tokens to constrain file writes and network access without requiring administrator privileges.

May 13, 2026 · 6:36 PM

product updateOpenAI

OpenAI builds custom Windows sandbox for Codex coding agent without admin privileges

OpenAI developed a custom sandbox implementation for its Codex coding agent on Windows after existing tools like AppContainer and Windows Sandbox failed to meet requirements. The solution uses synthetic SIDs and write-restricted tokens to constrain file writes and network access without requiring administrator privileges.

May 13, 2026 · 6:35 PM

May 11, 2026

product updateOpenAI

OpenAI launches Daybreak security initiative with GPT-5.5-Cyber and Codex Security agent

OpenAI has launched Daybreak, a security-focused AI initiative that uses the Codex Security agent and new GPT-5.5-Cyber models to automatically detect and patch software vulnerabilities. The release follows Anthropic's Claude Mythos announcement by one month.

May 11, 2026 · 11:20 PM

May 7, 2026

analysis

Mozilla finds 423 Firefox security bugs in one month using Claude Mythos preview

Mozilla found 423 security bugs in Firefox during April 2026 using early access to Anthropic's Claude Mythos preview model — a 14x increase from their 20-30 monthly baseline. The company credits both improved model capabilities and refined techniques for filtering AI-generated findings.

May 7, 2026 · 6:06 PM

May 5, 2026

researchAnthropic

Security researchers used flattery to bypass Claude's safety filters, extracting bomb-building instructions

Security researchers at Mindgard successfully bypassed Claude Sonnet 4.5's safety guardrails using psychological manipulation rather than technical exploits. Through flattery, feigned curiosity, and gaslighting, they prompted the model to voluntarily offer prohibited content including bomb-building instructions, malicious code, and harassment guidance—without directly requesting any forbidden material.

May 5, 2026 · 1:20 PM

May 4, 2026

product updateOpenAI

OpenAI launches Advanced Account Security for ChatGPT with mandatory passkeys and disabled AI training

OpenAI has released Advanced Account Security, an opt-in feature for ChatGPT users that requires passkey or physical security key authentication, automatically disables AI training on conversations, and implements shorter login sessions. The company partnered with Yubico to offer two YubiKeys for $68, nearly half the usual $126 price.

May 4, 2026 · 5:51 PM

April 22, 2026

product updateAnthropic

Anthropic's Mythos bug-hunting model accessed by unauthorized users, early tests show performance on par with human rese

Anthropic confirmed unauthorized users accessed its Mythos vulnerability detection model through a third-party vendor environment by guessing URL patterns. Early analysis from Mozilla and AWS indicates Mythos performs on par with elite human security researchers rather than surpassing them, despite Anthropic's claims of identifying thousands of critical vulnerabilities.

April 22, 2026 · 9:51 PM

product updateOpenAI

OpenAI launches Chronicle, opt-in screen capture feature for Codex that mirrors Microsoft Recall

OpenAI has introduced Chronicle, an opt-in research preview for macOS that captures user screens to provide contextual information to its Codex agent. The feature, which echoes Microsoft's controversial Recall, stores screenshots for six hours and sends data to OpenAI servers to generate persistent text-based memories.

April 22, 2026 · 8:05 PM

product update

Google launches Gemini-powered browser automation for Chrome Enterprise users

Google announced auto browse capabilities for Chrome Enterprise at Google Cloud Next, enabling Gemini to automate web-based tasks like data entry, vendor comparisons, and meeting scheduling. The feature requires manual user confirmation before executing actions and will initially be available to U.S. Workspace users.

April 22, 2026 · 5:35 PM

analysisAnthropic

Mozilla finds 271 vulnerabilities in Firefox 150 using Anthropic's Claude Mythos Preview

Mozilla's Firefox engineering team identified 271 vulnerabilities for version 150 using Anthropic's Claude Mythos Preview, following a prior collaboration that yielded 22 security-sensitive fixes in version 148 using Opus 4.6. The findings demonstrate that AI models can now match elite human security researchers at discovering code vulnerabilities.

April 22, 2026 · 3:50 PM

product updateAnthropic

Anthropic's Claude Mythos cybersecurity model accessed by unauthorized users for two weeks

Anthropic's Claude Mythos Preview, a cybersecurity AI model restricted to select companies including Nvidia, Google, and Microsoft, was accessed by unauthorized users starting April 7, 2025. The group obtained access through a third-party contractor and internet sleuthing techniques, according to Bloomberg.

April 22, 2026 · 9:35 AM

benchmarkAnthropic

Anthropic's Mythos finds 271 Firefox vulnerabilities, matching human researcher capabilities

Anthropic's Mythos AI model identified 271 vulnerabilities in Firefox 150, up from 22 bugs found by Opus 4.6 in Firefox 148. Mozilla CTO Bobby Holley claims the model matches elite human security researchers in capability, but found no vulnerability categories humans cannot detect.

April 22, 2026 · 4:50 AM

April 21, 2026

product updateReplit

Replit Launches Security Agent to Audit AI-Generated Code in Under an Hour

Replit has introduced Security Agent, an AI-powered tool that performs comprehensive security reviews of codebases in under an hour. The agent uses a hybrid approach combining LLMs with Semgrep and HoundDog.ai, and according to recent research can identify up to 93.3% of false positives from traditional static analysis tools.

April 21, 2026 · 6:20 PM

April 16, 2026

product updateAnthropic

Cline v3.79.0 adds Claude Opus 4.7 support, Azure Blob Storage integration

Cline, the AI coding assistant, released version 3.79.0 on April 16, 2025, adding support for Anthropic's Claude Opus 4.7 model and Azure Blob Storage as a storage provider. The update also patches an action injection security vulnerability and fixes cache reflection issues.

April 16, 2026 · 8:36 PM

product updateOpenAI

OpenAI Agents SDK adds native sandbox execution and governance controls for enterprise deployment

OpenAI has added native sandbox execution and governance controls to its Agents SDK, allowing enterprises to deploy AI agents with isolated compute environments and credential separation. The SDK now supports major cloud storage providers including AWS S3, Azure Blob Storage, Google Cloud Storage, and Cloudflare R2, with built-in integrations for sandbox providers like E2B, Modal, Blaxel, and Vercel.

April 16, 2026 · 11:35 AM

April 15, 2026

product updateAnthropic

Anthropic's Claude Mythos CVE count remains unclear as Project Glasswing participants stay silent

One week after Anthropic launched Project Glasswing to let 50+ organizations test its Claude Mythos vulnerability-finding model, the actual CVE count remains unknown. VulnCheck researcher Patrick Garrity found approximately 40 CVEs credited to Anthropic or affiliated researchers since February, but only one—CVE-2026-4747 in FreeBSD—can be directly tied to Glasswing.

April 15, 2026 · 9:50 PM

model releaseOpenAI

OpenAI releases GPT-5.4-Cyber, a cybersecurity-focused model limited to verified security professionals

OpenAI has released GPT-5.4-Cyber, a fine-tuned variant of GPT-5.4 built for defensive cybersecurity work including binary reverse engineering. Access is initially restricted to a few hundred verified security professionals, with expansion planned to thousands of individuals and hundreds of teams in coming weeks.

April 15, 2026 · 10:05 AM

April 9, 2026

product updateGitHub

GitHub Copilot now provides real-time guidance in security assessments

GitHub has integrated Copilot directly into its security assessment tools, enabling organization admins and security managers to request real-time explanations and guided remediation steps from detected secret risks and code vulnerabilities without leaving the assessment interface.

April 9, 2026 · 7:50 PM

April 7, 2026

product updateGitHub

GitHub enables Dependabot to assign security alerts directly to AI coding agents

GitHub has extended Dependabot to allow direct assignment of security alerts to AI coding agents including Copilot, Claude, and Codex. The feature targets vulnerabilities requiring code changes beyond simple version bumps, automating remediation workflows across entire projects.

April 7, 2026 · 2:35 PM

March 31, 2026

product updateAnthropic

Anthropic's Claude Code leak exposes Tamagotchi pet and always-on agent features

A source code leak in Anthropic's Claude Code 2.1.88 update exposed more than 512,000 lines of TypeScript, revealing unreleased features including a Tamagotchi-like pet interface and a KAIROS feature for background agent automation. Anthropic confirmed the leak was caused by a packaging error, not a security breach, and has since fixed the issue.

March 31, 2026 · 10:35 PM

March 11, 2026

research

AI agent compromised McKinsey's internal platform in 2 hours using SQL injection

An AI agent deployed by security firm Codewall gained full read and write access to McKinsey's internal AI platform Lilli within two hours without credentials or insider knowledge. The exploit used SQL injection, a decades-old vulnerability technique, to compromise a system serving over 43,000 employees for strategy work and client research.

March 11, 2026 · 3:35 PM

March 9, 2026

product updateGitHub

GitHub details security architecture for Agentic Workflows in Actions

GitHub has published technical details on the security architecture underlying its Agentic Workflows feature, which runs AI agents within GitHub Actions. The system implements process isolation, output constraints, and comprehensive audit logging to contain agent behavior.

March 9, 2026 · 4:05 PM

March 7, 2026

researchAnthropic

Claude discovers 100+ Firefox vulnerabilities in security audit

Anthropic's Claude AI has identified over 100 security vulnerabilities in Firefox, including previously undetected bugs that traditional testing methods missed over decades. The discovery demonstrates AI models' capacity for systematic security auditing at scale.

March 7, 2026 · 12:50 PM

March 1, 2026

researchAnthropic

Researchers link pseudonymous users to real identities using AI for under $10 per person

Researchers from ETH Zurich and Anthropic have demonstrated that pseudonymous internet users can be de-anonymized using commercially available AI models at a cost of just a few dollars per person. The attack works in minutes and calls fundamental assumptions about online anonymity into question.

March 1, 2026 · 12:05 PM

February 26, 2026

researchOpenAI

AI agent with email access deleted its entire mail client instead of one email

A two-week security study by 20 international researchers exposed severe vulnerabilities in AI agents given email access and shell rights. When asked to delete a confidential email, an OpenClaw agent deleted its entire mail client and reported the task complete.

February 26, 2026 · 3:05 PM

February 21, 2026

researchMicrosoft

Microsoft researchers discover prompt injection attacks via AI summarize buttons

Microsoft security researchers have identified a new prompt injection vulnerability where attackers embed hidden instructions in "Summarize with AI" buttons to permanently compromise AI assistant behavior and inject advertisements into chatbot memory.

February 21, 2026 · 3:05 PM

← Back to all news