AMD AI director reports Claude Code performance degradation since March update
Stella Laurenzo, director of AI at AMD, filed a GitHub issue documenting significant performance degradation in Claude Code since early March, specifically following the deployment of thinking content redaction in version 2.1.69. Analysis of 6,852 sessions with 234,760 tool calls shows stop-hook violations increased from zero to 10 per day, while code-reading behavior dropped from 6.6 reads to 2 reads per session.
AMD's Stella Laurenzo, director of the company's AI group, has documented a significant performance decline in Claude Code since an early March update, claiming the model has become "dumber and lazier" in complex engineering tasks.
The Data
Laurenzo filed a GitHub issue on April 4, 2026, backed by analysis of 6,852 Claude Code sessions incorporating 234,760 tool calls and 17,871 thinking blocks. The degradation pattern emerged sharply after March 8, 2026, correlating with the deployment of Claude Code version 2.1.69, which introduced thinking content redaction—a feature that strips thinking content from API responses by default.
Key metrics from AMD's analysis:
- Stop-hook violations (indicating laziness): Rose from zero prior to March 8 to 10 per day average by month's end
- Code reads before edits: Dropped from 6.6 reads per session to 2 reads
- File rewrites vs. targeted edits: Frequency of full-file rewrites increased significantly
"Every senior engineer on my team has reported similar experiences," Laurenzo stated in the issue.
The Thinking Redaction Connection
Laurenzo attributes the degradation to thinking content redaction introduced in v2.1.69. The feature defaults to hiding Claude's reasoning process from users, preventing visibility into what the model is actually doing while processing requests.
"When thinking is shallow, the model defaults to the cheapest action available: edit without reading, stop without finishing, dodge responsibility for failures, take the simplest fix rather than the correct one," Laurenzo wrote. "These are exactly the symptoms observed."
This represents a separate issue from Claude Code's February incident when version 2.1.20 caused the model to truncate explanations of its reading process, reducing specificity to brief file counts.
Anthropic's Credibility Crisis
The report arrives amid mounting pressure on Anthropic:
- Unexplained surges in token usage have pushed users past billing limits
- Claude Code's entire source code was exposed publicly
- The exposed code revealed how extensively Claude Code can access user system information
Laurenzo's requests to Anthropic are direct: transparency about whether thinking tokens are being reduced or capped, exposure of token counts per request, and a premium tier offering guaranteed deep thinking for complex workflows.
"The current subscription model doesn't distinguish between users who need 200 thinking tokens per response and users who need 20,000," she explained. "Users running complex engineering workflows would pay significantly more for guaranteed deep thinking."
Market Implications
While Laurenzo indicated her team has switched to another provider offering "superior quality work," she's leaving the GitHub issue as a public notice to Anthropic. Her warning is stark: "It's still early in the AI coding game and Anthropic is looking at giving up the top spot if its behavior continues."
She declined to specify the alternative tool due to NDA restrictions, but confirmed it's delivering better results for AMD's complex engineering workflows.
What this means
Laurenzo's analysis—backed by specific behavioral metrics from a real high-complexity engineering environment—suggests Claude Code's thinking quality may have genuinely degraded, not due to model capability loss but due to architectural changes that may be suppressing reasoning depth. The March timing and correlation with v2.1.69 provide testable claims Anthropic should address transparently. For enterprise users relying on Claude Code for complex tasks, the core issue is transparency: without visibility into thinking token allocation and depth, there's no way to distinguish between legitimate efficiency improvements and cost-cutting that sacrifices quality. Anthropic faces pressure to either restore visibility, confirm no degradation occurred, or explain the trade-offs being made.
Related Articles
Anthropic Doubles Claude Code Rate Limits, Secures 300+ MW Compute from SpaceX's Colossus 1
Anthropic has secured access to all compute capacity at SpaceX's Colossus 1 data center, adding more than 300 megawatts of new capacity within the month. As a result, the company is doubling five-hour rate limits for paid Claude Code users and removing peak hour restrictions for Pro and Max tiers.
Anthropic doubles Claude Code usage limits for paid users, increases API capacity by up to 1500%
Anthropic has doubled Claude Code's five-hour usage limits for Pro, Max, Team, and Enterprise users while removing peak hour restrictions for Pro and Max plans. The company also increased API limits by up to 1500% for input tokens per minute through a compute capacity deal with SpaceX's Colossus 1 data center.
Anthropic Python SDK v0.104.0 adds thinking token count estimates for streaming responses
Anthropic released version 0.104.0 of its Python SDK on May 21, 2026. The update adds support for a thinking-token-count beta feature that provides estimated token counts in thinking block deltas when streaming responses from reasoning models.
OpenAI reasoning model solves 80-year math problem as Anthropic hits $10.9B quarterly revenue
In a two-hour span Wednesday, OpenAI announced its reasoning model autonomously solved an 80-year-old geometry problem while Anthropic reported it's on track for $10.9 billion in Q2 revenue with $559 million in operating profit—two years ahead of internal projections. The developments came alongside Nvidia's $81.6 billion quarter, Anthropic's $1.25 billion monthly SpaceX compute deal, and a White House AI executive order signing.
Comments
Loading...