guardrails

6 articles tagged with guardrails

June 19, 2026

US government forces Anthropic to pull Fable 5 and Mythos 5 models over guardrail bypass concerns

The US government forced Anthropic to withdraw its Fable 5 and Mythos 5 models, citing national security concerns after Amazon researchers allegedly discovered a method to bypass Fable 5's safety guardrails. Cybersecurity researchers have signed an open letter opposing the ban, with Anthropic noting similar vulnerabilities exist in competing models.

June 19, 2026 · 4:20 PM

June 11, 2026

analysisAnthropic

Anthropic reverses course on invisible Claude Fable distillation guardrails after researcher backlash

Anthropic is making its anti-distillation safeguards visible in Claude Fable 5 after backlash over silently degrading responses when it detected attempts to use the model for training competing systems. Queries suspected of distillation will now be routed to Claude Opus 4.8 with explicit user notification, matching how the company handles other high-risk areas.

June 11, 2026 · 11:50 AM

June 10, 2026

model releaseAnthropic

Anthropic's Fable cybersecurity model blocks routine security work, researchers say

Anthropic released Fable, a public version of its cybersecurity model Mythos, but security researchers report the model's guardrails are blocking routine tasks. The model flags requests as cybersecurity-related even for reading blog posts or requesting code reviews, downgrading to Claude Opus 4.8 when triggered.

June 10, 2026 · 3:50 PM

June 4, 2026

model releaseNVIDIA

Nvidia Releases Free 4B-Parameter Nemotron 3.5 Content Safety Model with 128K Context

Nvidia has released Nemotron 3.5 Content Safety, a 4-billion parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. The model is available for free, supports 128K token context windows, and moderates content across 12 languages.

June 4, 2026 · 2:50 PM

April 23, 2026

changelogAnthropic

Claude Opus 4.7 refusal rate surges to 30+ monthly complaints as Anthropic tests aggressive guardrails

Anthropic's Claude Opus 4.7 release triggered a sharp increase in false positive refusals, with developers filing 30+ complaints in April 2026 compared to 2-3 monthly reports from July-September 2025. The company deployed aggressive Acceptable Use Policy guardrails to prepare for the eventual release of its Mythos vulnerability research model.

April 23, 2026 · 9:06 PM

March 26, 2026

product updateAmazon Web Services

Amazon Bedrock Guardrails now supports age-responsive, context-aware safety policies

Amazon has released a serverless architecture solution using Bedrock Guardrails that dynamically selects safety policies based on user age, role, and industry. The solution enforces five specialized guardrails—including COPPA-compliant child protection and healthcare-specific policies—at inference time to prevent prompt injection attacks and ensure context-appropriate responses.

March 26, 2026 · 5:35 PM

← Back to all news