guardrails
6 articles tagged with guardrails
US government forces Anthropic to pull Fable 5 and Mythos 5 models over guardrail bypass concerns
The US government forced Anthropic to withdraw its Fable 5 and Mythos 5 models, citing national security concerns after Amazon researchers allegedly discovered a method to bypass Fable 5's safety guardrails. Cybersecurity researchers have signed an open letter opposing the ban, with Anthropic noting similar vulnerabilities exist in competing models.
Anthropic reverses course on invisible Claude Fable distillation guardrails after researcher backlash
Anthropic is making its anti-distillation safeguards visible in Claude Fable 5 after backlash over silently degrading responses when it detected attempts to use the model for training competing systems. Queries suspected of distillation will now be routed to Claude Opus 4.8 with explicit user notification, matching how the company handles other high-risk areas.
Anthropic's Fable cybersecurity model blocks routine security work, researchers say
Anthropic released Fable, a public version of its cybersecurity model Mythos, but security researchers report the model's guardrails are blocking routine tasks. The model flags requests as cybersecurity-related even for reading blog posts or requesting code reviews, downgrading to Claude Opus 4.8 when triggered.
Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context
Nvidia has released Nemotron 3.5 Content Safety, a 4-billion parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. The model is available for free, supports 128K token context windows, and moderates content across 12 languages.
Claude Opus 4.7 refusal rate surges to 30+ monthly complaints as Anthropic tests aggressive guardrails
Anthropic's Claude Opus 4.7 release triggered a sharp increase in false positive refusals, with developers filing 30+ complaints in April 2026 compared to 2-3 monthly reports from July-September 2025. The company deployed aggressive Acceptable Use Policy guardrails to prepare for the eventual release of its Mythos vulnerability research model.
Amazon Bedrock Guardrails now supports age-responsive, context-aware safety policies
Amazon has released a serverless architecture solution using Bedrock Guardrails that dynamically selects safety policies based on user age, role, and industry. The solution enforces five specialized guardrails—including COPPA-compliant child protection and healthcare-specific policies—at inference time to prevent prompt injection attacks and ensure context-appropriate responses.