Anthropic postmortem: 6 weeks of Claude Code complaints from 3 unrelated changes, Zubnet AI News

Anthropic published a detailed engineering postmortem explaining the six weeks of Claude Code quality complaints that ran from March through mid-April. Three unrelated product-layer changes shipped between March 4 and April 16, each affecting a different slice of traffic on its own schedule — which is why users reported wildly different symptoms depending on when they used Claude Code and which features they relied on. Critically, the API and the underlying model weights were not affected. All three issues are resolved as of Claude Code v2.1.116 on April 20, and Anthropic has reset usage limits for all subscribers. The honest disclosure is the part that distinguishes this from the typical "we are improving the product" press release: specific dates, specific commits, specific quality numbers, and named accountability from the Claude Code team.

The three changes break down clean. First, on March 4, the default reasoning effort was switched from high to medium to fix UI freezes during long thinking periods — Anthropic acknowledges this was "the wrong tradeoff" and reverted on April 7. All models now default to high or xhigh. Second, on March 26, an optimization to clear old thinking sections from idle sessions (sessions idle 1+ hour, where a full cache miss was unavoidable anyway) had a bug that fired the clearing on every turn instead of just once. Claude kept executing but increasingly without memory of why it chose its current approach. Boris Cherny on Hacker News named the extreme case: a user with 900K tokens in context idling for an hour would face a full cache miss on the next message, consuming a significant percentage of rate limits — especially for Pro users. Fixed April 10. Third, on April 16 alongside Opus 4.7, Anthropic added a verbosity limit to the system prompt instructing the model to "keep text between tool calls to 25 words or less" and "keep final responses to 100 words or less." Internal testing found no regressions. Broader ablations run during the investigation revealed a 3% quality drop on both Opus 4.6 and 4.7. Reverted April 20. As a meta-finding, Anthropic's own Code Review tool back-tested against the offending PRs: Opus 4.7 found the caching bug when given sufficient repo context; Opus 4.6 did not.

The community critique is worth weighting alongside the postmortem. Hacker News commenters questioned whether latency-reduction was the real driver of the idle-session pruning or whether cost-reduction was the actual goal with latency as cover. Several flagged that publishing benchmarks against an older system prompt and then shipping a new one without disclosure "feels deceptive." Fortune reported that users felt "gaslit" by Anthropic's initial denials. Anthropic's own response to Fortune included the line "compute is a constraint across the entire industry" alongside the scaling-partnership pitch. Two issues the postmortem does not address are also worth tracking: Reddit users report that Claude Code delegates more often than expected to the cheaper Haiku model — visible only in verbose logging — which is a quality-risk class that's silent in automated pipelines but obvious in interactive use. And at least one Reddit user independently confirmed that the verbosity instructions remained in the system prompt after the announced April 20 revert, building a pre-tool hook script that countered five specific failure modes around skipped source re-reading and confident-but-mismatched output. The fact that the user-built workaround improved quality is itself a validation of the postmortem's diagnosis of the verbosity limit as quality-degrading.

For builders running Claude Code in production: the canonical fixes are v2.1.116. If you have CI or automated workflows that ran during March-April, audit any output for the three-change fingerprint — especially long idle sessions (caching bug) and any output around April 16-20 (verbosity prompt). For builders shipping LLM-backed products generally, the engineering lesson lifted directly: internal evals failed to catch any of the three issues because internal staff used different builds than the public release, the caching bug only manifested in a specific state (stale sessions) not exercised in eval, and the eval suite was too narrow to detect a 3% quality drop on system prompt changes. Anthropic's stated remediation — internal staff on exact public builds, per-model broader eval suites for every system prompt change, soak periods and gradual rollouts, careful versioning of system prompt changes — is a reasonable industry baseline. The "compute is a constraint" admission also matters: if Anthropic is making cost/latency tradeoffs that visibly degrade quality, the same logic applies to every vendor, and product procurement teams should ask explicitly about latency-vs-quality decisions in vendor SLAs.

Anthropic postmortem: 6 weeks of Claude Code complaints from 3 unrelated changes

More News