Anthropic announces Claude Code's reduced intelligence post-analysis: three layers of product modifications stacked, not a model issue

robot
Abstract generation in progress

According to Beating Monitoring, the Anthropic engineering team issued a statement confirming that the decline in Claude Code quality feedback from users over the past month originated from three independent product-level changes, while the API and underlying models were unaffected. The three issues were fixed on April 7, 10, and 20, with the final version being v2.1.116.

The first change occurred on March 4. To reduce occasional extremely long delays (UI appearing frozen) caused by high inference load on Opus 4.6, the team lowered the default inference strength of Claude Code from high to medium. Users generally reported the system becoming less responsive, so on April 7, the change was rolled back, and now Opus 4.7 defaults to xhigh, while other models default to high.

The second issue was a bug introduced on March 26. Originally, the design was to clear old inference records after one hour of session inactivity to save costs when resuming sessions. An implementation flaw caused the clearing to happen not once, but on every subsequent round, gradually causing the model to lose previous inference context, manifesting as increasing forgetfulness, repeated operations, and abnormal tool calls. This bug also caused cache misses on every request, accelerating user quota consumption. The team stated that two unrelated internal experiments masked the reproduction conditions, and troubleshooting took over a week. The bug was fixed on April 10. Afterwards, code review backtests using Opus 4.7 on the problematic PR showed that Opus 4.7 could detect this bug, while Opus 4.6 could not.

The third change was deployed with Opus 4.7 on April 16. The team added a length limit instruction to the system prompt: “Text between tool calls should not exceed 25 words, and the final reply should not exceed 100 words unless more details are required.” Internal testing for several weeks showed no regression, but after deployment, stacking this with other prompts damaged coding quality, affecting Sonnet 4.6, Opus 4.6, and Opus 4.7. Broader evaluation revealed a 3% decline in both Opus 4.6 and 4.7, leading to a rollback on April 20.

These three changes affected different user groups and took effect at different times. Their overlapping effects resulted in widespread and inconsistent quality declines, complicating troubleshooting. Anthropic stated that going forward, they will require more internal staff to use the same publicly available build as users, run full model evaluation suites on every system prompt modification, and implement a gray release period.

As compensation, Anthropic has reset the usage quotas for all subscribed users.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin