Anthropic postmortem: Claude Code पर 6 हफ़्ते की शिकायतें 3 असंबंधित बदलावों से, Zubnet AI समाचार

Anthropic ने Claude Code की quality पर मार्च से मध्य-अप्रैल तक चली छह हफ़्तों की शिकायतों को समझाते हुए एक विस्तृत engineering postmortem प्रकाशित किया। 4 मार्च से 16 अप्रैल के बीच तीन असंबंधित product-layer changes shipped हुए, हर एक अपने schedule पर traffic के अलग slice को प्रभावित कर रहा था — यही कारण है कि users ने Claude Code का कब उपयोग किया और किन features पर निर्भर रहे, उसके अनुसार बहुत भिन्न symptoms रिपोर्ट किए। महत्त्वपूर्ण रूप से, API और underlying model weights प्रभावित नहीं हुए। तीनों मुद्दे 20 अप्रैल को Claude Code v2.1.116 के साथ हल हो गए हैं, और Anthropic ने सभी subscribers के लिए usage limits रीसेट कर दी हैं। honest disclosure वह हिस्सा है जो इसे "हम product सुधार रहे हैं" वाली typical press release से अलग करता है: विशिष्ट दिनांक, विशिष्ट commits, विशिष्ट quality numbers, और Claude Code team की नामित जवाबदेही।

तीन changes साफ़-सुथरे ढंग से अलग होते हैं। पहला, 4 मार्च को default reasoning effort high से medium पर बदला गया ताकि लंबे thinking periods के दौरान UI freezes ठीक हों — Anthropic स्वीकार करता है कि यह "ग़लत tradeoff" था और 7 अप्रैल को revert कर दिया। सभी models अब default में high या xhigh। दूसरा, 26 मार्च को 1 घंटे से अधिक idle sessions से पुराने thinking sections clear करने का एक optimization (जहाँ वैसे भी पूरा cache miss अनिवार्य था) में एक bug था जो clearing को एक बार के बजाय हर turn पर fire करता था। Claude execute करता रहा पर बढ़ती हुई बिना यह याद रखे कि उसने अपना current approach क्यों चुना। Boris Cherny ने Hacker News पर extreme case को नाम दिया: 900K tokens context वाला user जो एक घंटे के लिए idle रहता है, अगले message पर full cache miss का सामना करेगा, rate limits का significant percentage खाते हुए — विशेषकर Pro users के लिए। 10 अप्रैल को ठीक हुआ। तीसरा, 16 अप्रैल को Opus 4.7 के साथ, Anthropic ने system prompt में verbosity limit जोड़ी, model को निर्देश देते हुए कि "tool calls के बीच text 25 शब्दों या उससे कम रखो" और "final responses 100 शब्दों या उससे कम रखो"। Internal testing में कोई regressions नहीं मिले। जाँच के दौरान चलाए गए व्यापक ablations ने Opus 4.6 और 4.7 दोनों पर 3% quality drop प्रकट किया। 20 अप्रैल को revert। एक meta-finding: Anthropic के अपने Code Review tool को offending PRs के विरुद्ध back-test किया गया — पर्याप्त repo context देने पर Opus 4.7 ने caching bug ढूँढ़ा; Opus 4.6 ने नहीं।

community critique postmortem के साथ तौलने लायक़ है। Hacker News commenters ने सवाल किया कि क्या latency-reduction idle-session pruning का असली driver था, या cost-reduction असली लक्ष्य था और latency बस cover के रूप में थी। कई ने flag किया कि पुराने system prompt के विरुद्ध benchmarks प्रकाशित करना और फिर बिना disclosure नया भेजना "धोखेबाज़ी जैसा लगता है"। Fortune ने रिपोर्ट किया कि users ने Anthropic की प्रारंभिक denials से "gaslighted" महसूस किया। Anthropic ने Fortune को जवाब में कहा कि "compute पूरी industry में एक बाधा है" साथ ही scaling-partnership pitch भी जोड़ा। दो issues जिन्हें postmortem संबोधित नहीं करता वे भी ट्रैक करने लायक़ हैं: Reddit users रिपोर्ट करते हैं कि Claude Code अपेक्षा से अधिक बार सस्ते Haiku model को task delegate करता है — केवल verbose logging में दिखाई देता है — जो automated pipelines में silent है पर interactive use में स्पष्ट है। और कम से कम एक Reddit user ने स्वतंत्र रूप से पुष्टि की कि घोषित 20 अप्रैल revert के बाद भी verbosity instructions system prompt में बने रहे, उसने एक pre-tool hook script बनाया जो स्रोत re-reading skip करने और confident-but-mismatched output के पाँच विशिष्ट failure modes का प्रतिकार करता है। यह तथ्य कि user-built workaround ने quality सुधारी, अपने आप में verbosity limit को quality-degrading change के रूप में postmortem की diagnosis की पुष्टि है।

production में Claude Code चला रहे builders के लिए: canonical fixes v2.1.116 हैं। अगर तुम्हारे पास March-April के दौरान चले CI या automated workflows हैं, तो किसी भी output को तीन-change fingerprint के लिए audit करो — विशेषकर लंबे idle sessions (caching bug) और 16-20 अप्रैल के आसपास कोई भी output (verbosity prompt)। आम तौर पर LLM-backed products ship करने वाले builders के लिए, सीधे उठाने योग्य engineering lesson: internal evals तीनों में से कोई भी मुद्दा पकड़ने में विफल रहे क्योंकि internal staff public release से अलग builds इस्तेमाल कर रहे थे, caching bug केवल एक विशिष्ट state (stale sessions) में manifest हुआ जो eval में नहीं चलाया गया था, और eval suite system prompt changes से 3% quality drop detect करने के लिए बहुत संकीर्ण था। Anthropic का घोषित remediation — internal staff exactly public builds पर, हर system prompt change के लिए broader per-model eval suites, soak periods और gradual rollouts, system prompt changes का careful versioning — एक उचित industry baseline है। "compute एक बाधा है" वाली स्वीकृति भी मायने रखती है: यदि Anthropic ऐसी cost/latency tradeoffs कर रहा है जो दिखाई देने वाले रूप में quality degrade करती हैं, वही logic हर vendor पर लागू होती है, और product procurement teams को vendor SLAs में latency-vs-quality decisions के बारे में स्पष्ट रूप से पूछना चाहिए।

Anthropic postmortem: Claude Code पर 6 हफ़्ते की शिकायतें 3 असंबंधित बदलावों से

और समाचार