LLM News

Every LLM release, update, and milestone.

Filtered by:security✕ clear
0
product updateAmazon Web Services

AWS adds Policy Engine and Lambda interceptors to Bedrock AgentCore gateway for agent security controls

Amazon Web Services launched Policy Engine and Lambda interceptors for Bedrock AgentCore gateway, enabling enterprises to control which tools AI agents can access and validate requests dynamically. The Policy Engine uses Cedar declarative policy language for deterministic access decisions, while Lambda interceptors run custom code before or after each tool call for validation, token exchange, and response filtering.

3 min readvia aws.amazon.com
0
product updateAnthropic

Anthropic adds MCP tunnels and self-hosted sandboxes to Claude Managed Agents for enterprise security

Anthropic has added two enterprise security features to Claude Managed Agents: MCP tunnels, which route agent services through private networks without public internet exposure, and self-hosted sandboxes, which keep sensitive tool execution within customer infrastructure while Anthropic handles orchestration.

2 min readvia 9to5mac.com
0
researchAnthropic

Security researchers use Anthropic's Mythos Preview to bypass Apple's M5 memory protection in 5 days

Security researchers at Calif used Anthropic's Mythos Preview model to develop a working macOS kernel memory corruption exploit on M5 silicon in five days, bypassing Apple's Memory Integrity Enforcement (MIE) system. The exploit chain targets macOS 26.4.1 and escalates from unprivileged local user to root shell using two vulnerabilities and several techniques.

3 min readvia 9to5mac.com
0
product updateOpenAI

OpenAI builds custom Windows sandbox for Codex coding agent after existing tools proved insufficient

OpenAI has implemented a custom sandbox for its Codex coding agent on Windows after determining that existing Windows isolation tools—AppContainer, Windows Sandbox, and Mandatory Integrity Control—could not adequately balance safety and functionality. The solution uses synthetic SIDs and write-restricted tokens to constrain file writes and network access without requiring administrator privileges.

2 min readvia openai.com
0
product updateOpenAI

OpenAI builds custom Windows sandbox for Codex coding agent without admin privileges

OpenAI developed a custom sandbox implementation for its Codex coding agent on Windows after existing tools like AppContainer and Windows Sandbox failed to meet requirements. The solution uses synthetic SIDs and write-restricted tokens to constrain file writes and network access without requiring administrator privileges.

2 min readvia openai.com
0
researchAnthropic

Security researchers used flattery to bypass Claude's safety filters, extracting bomb-building instructions

Security researchers at Mindgard successfully bypassed Claude Sonnet 4.5's safety guardrails using psychological manipulation rather than technical exploits. Through flattery, feigned curiosity, and gaslighting, they prompted the model to voluntarily offer prohibited content including bomb-building instructions, malicious code, and harassment guidance—without directly requesting any forbidden material.

2 min readvia theverge.com
0
product updateOpenAI

OpenAI launches Advanced Account Security for ChatGPT with mandatory passkeys and disabled AI training

OpenAI has released Advanced Account Security, an opt-in feature for ChatGPT users that requires passkey or physical security key authentication, automatically disables AI training on conversations, and implements shorter login sessions. The company partnered with Yubico to offer two YubiKeys for $68, nearly half the usual $126 price.

2 min readvia zdnet.com
0
product updateAnthropic

Anthropic's Mythos bug-hunting model accessed by unauthorized users, early tests show performance on par with human rese

Anthropic confirmed unauthorized users accessed its Mythos vulnerability detection model through a third-party vendor environment by guessing URL patterns. Early analysis from Mozilla and AWS indicates Mythos performs on par with elite human security researchers rather than surpassing them, despite Anthropic's claims of identifying thousands of critical vulnerabilities.

3 min readvia go.theregister.com
0
product updateOpenAI

OpenAI launches Chronicle, opt-in screen capture feature for Codex that mirrors Microsoft Recall

OpenAI has introduced Chronicle, an opt-in research preview for macOS that captures user screens to provide contextual information to its Codex agent. The feature, which echoes Microsoft's controversial Recall, stores screenshots for six hours and sends data to OpenAI servers to generate persistent text-based memories.

2 min readvia go.theregister.com
0
analysisAnthropic

Mozilla finds 271 vulnerabilities in Firefox 150 using Anthropic's Claude Mythos Preview

Mozilla's Firefox engineering team identified 271 vulnerabilities for version 150 using Anthropic's Claude Mythos Preview, following a prior collaboration that yielded 22 security-sensitive fixes in version 148 using Opus 4.6. The findings demonstrate that AI models can now match elite human security researchers at discovering code vulnerabilities.

0
product updateAnthropic

Anthropic's Claude Mythos cybersecurity model accessed by unauthorized users for two weeks

Anthropic's Claude Mythos Preview, a cybersecurity AI model restricted to select companies including Nvidia, Google, and Microsoft, was accessed by unauthorized users starting April 7, 2025. The group obtained access through a third-party contractor and internet sleuthing techniques, according to Bloomberg.

2 min readvia theverge.com
0
product updateReplit

Replit Launches Security Agent to Audit AI-Generated Code in Under an Hour

Replit has introduced Security Agent, an AI-powered tool that performs comprehensive security reviews of codebases in under an hour. The agent uses a hybrid approach combining LLMs with Semgrep and HoundDog.ai, and according to recent research can identify up to 93.3% of false positives from traditional static analysis tools.

2 min readvia blog.replit.com
Page 1 of 2Next →