product updateAnthropic

Anthropic launches Claude Code 'auto mode' with AI-powered permission classifier

TL;DR

Anthropic has released 'auto mode' for Claude Code, a permissions system that sits between conservative defaults and fully disabled safeguards. The feature uses a classifier to automatically approve safe actions like file writes and bash commands while blocking potentially destructive operations.

2 min read
0

Anthropic Launches Claude Code 'Auto Mode' With AI-Powered Permission Classifier

Anthropric rolled out "auto mode" for Claude Code on March 24, 2026, introducing a new permissions framework that balances developer convenience against safety risks.

The feature addresses a usability problem: Claude Code's default configuration requires explicit user approval before executing each file write or bash command. Developers seeking faster execution have historically disabled all permissions using the --dangerously-skip-permissions flag, creating significant security exposure.

Auto mode introduces a middle path using a machine learning classifier that pre-screens each tool invocation before execution. The classifier identifies potentially destructive actions—including mass file deletion, sensitive data exfiltration, and malicious code patterns—and blocks them automatically. Actions deemed safe proceed without user interruption. If Claude repeatedly attempts blocked actions, the system escalates to a user permission prompt.

Anthropric explicitly notes that auto mode reduces risk compared to fully disabled permissions but does not eliminate it entirely. The company recommends using auto mode exclusively in isolated development environments.

Rollout Timeline

Claude Teams users gained access to auto mode as a research preview on March 24. Enterprise and API customers will receive access within days, according to Anthropic's announcement.

This update follows Anthropic's unveiling of a separate research preview feature that enables Claude to control macOS directly—another capability gated behind safety controls.

What This Means

Auto mode addresses a genuine friction point in AI-assisted development: the trade-off between safety guardrails and operational efficiency. By delegating routine safety checks to an ML classifier, Anthropic reduces manual approval overhead while maintaining the ability to catch genuinely dangerous operations. However, the existence of a classifier that can be circumvented introduces new attack surface—adversarial prompts could potentially exploit classification boundaries. The elevation to manual permission prompts when Claude insists on blocked actions suggests the system relies on Claude's own behavior modification rather than hard technical barriers, which may be bypassable. Enterprise adoption will likely depend on how well the classifier generalizes to production codebases with domain-specific patterns.

Related Articles

product update

U.S. government orders Anthropic to halt exports of Mythos and Fable AI models, both now offline for one week

The White House ordered Anthropic to restrict exports of its Mythos and Fable AI models last Friday, citing national security concerns. Anthropic pulled both models offline within 90 minutes of the Commerce Department directive, marking the first major test of AI export controls.

model release

US government forces Anthropic to pull Fable 5 and Mythos 5 models over guardrail bypass concerns

The US government forced Anthropic to withdraw its Fable 5 and Mythos 5 models, citing national security concerns after Amazon researchers allegedly discovered a method to bypass Fable 5's safety guardrails. Cybersecurity researchers have signed an open letter opposing the ban, with Anthropic noting similar vulnerabilities exist in competing models.

analysis

US export controls force Anthropic to take Claude Fable 5 offline indefinitely

The US government imposed export controls on Anthropic's newly released Claude Fable 5 and underlying Mythos models on Friday, restricting access even for foreign nationals working at Anthropic in the United States. Anthropic took both models completely offline rather than risk non-compliance, leaving Fable unavailable to all users as of this writing.

product update

Mistral releases Vibe 2.0 terminal coding agent with custom subagents and Devstral 2 API pricing

Mistral AI released Vibe 2.0, a terminal-native coding agent powered by Devstral 2, adding custom subagents, multi-choice clarifications, and slash-command skills. Devstral 2 API pricing is now $0.40/M input tokens and $2.00/M output tokens, with a smaller variant at $0.10/$0.30 per million tokens.

Comments

Loading...