Code With Claude: Managed Agents, cron पर routines, SWE-bench 62 से 87% पर कूदा, Zubnet AI समाचार

Anthropic ने इस सप्ताह Code With Claude चलाया और तीन चीज़ें ship कीं जो बदलती हैं कि Claude पर builders खुद क्या maintain करते हैं। Managed Agents sandboxed code execution, checkpointing, और credential scoping के लिए native primitives हैं। Proactive Workflows Claude Code routines हैं जो cron schedules, GitHub webhooks, या API endpoints पर fire होती हैं — destructive-action screening और prompt-injection detection के साथ Auto mode, साथ ही isolated branch management के लिए worktrees। Capability Curve framing है: SWE-bench Verified 62% (Sonnet 3.7, एक साल पहले) से 87% (Opus 4.7, अब) पर move हुआ। PM demos Jess Yan और Lance Martin द्वारा; Alex Albert ने curve present किया। Docs पहले से platform.claude.com/docs/en/managed-agents/overview पर live हैं। एक redesigned desktop GUI split views और inline diff comments के साथ साथ-साथ ship हुई, एक Rubber Duck critic के साथ जो planning के बाद, implementation के बाद, और tests से पहले run करता है।

Mechanically: Managed Agents Anthropic की bid है agent-infrastructure layer को own करने की उसी तरह जैसे वे model layer own करते हैं। Sandboxed execution + checkpointing + credential scoping ठीक वो primitives हैं जो LangGraph, OpenAI Agents SDK, AutoGen, और wrapper ecosystem बेचते हैं। Native primitives का मतलब है, Claude-first stacks के लिए, अब आपको LangGraph-style state management को ऊपर से bolt करने की ज़रूरत नहीं। Proactive Workflows Claude Code को cron/webhook/API-trigger territory में लाते हैं; पहले जो "claude को shell script और systemd timer में wrap करना" था अब built-in safety screens के साथ first-class routine है। Capability Curve number strategic message है: बारह महीनों में SWE-bench Verified पर +25 points। Albert का framing — "set expectations" — Anthropic का bet है कि model इतना तेज़ी से improve होता है कि elaborate agentic scaffolding engineering time invest करने की गलत जगह बन जाती है।

Ecosystem effect: यह वही move है जो OpenAI ने Agents SDK और Assistants API के साथ किया — primitives stack में ऊपर खींची गईं, wrapper surface area सिकुड़ती है। LangChain, LangGraph, CrewAI, AutoGen, और similar Claude-first wrappers ground खोते हैं इस अनुपात में कि वे पहले कितना state और credential plumbing own करते थे। Interesting demarcation protocol layer पर है: MCP agent-tools dimension को open और cross-vendor रखता है जबकि Managed Agents execution dimension own करता है। Proactive Workflows + Worktrees + Auto mode + Rubber Duck critic का मतलब है कि Claude Code अब first-class CI/CD agent runtime के रूप में positioned है, सिर्फ़ एक coding assistant नहीं — वही niche जिस पर Cursor का background-agent track और OpenAI का Codex-in-cloud निशाना साध रहे हैं। Cross-vendor agent orchestration (Claude/Gemini/OpenAI के बीच route करना) अभी भी wrapper-ecosystem play है; single-vendor Claude stacks को सबसे direct benefit मिलता है।

सोमवार: अगर आप Claude Code custom cron या CI setups में चलाते हैं, इस सप्ताह Routines पर port करें — कम moving parts, और आपको Auto mode का destructive-action screen और prompt-injection detection free में मिलता है। अगर आप LangGraph या AutoGen पर Claude को primary model के रूप में agent products build कर रहे हैं, audit करें कि कितना state management अब Managed Agents (sandboxed exec, checkpointing) द्वारा duplicated है। Eval side पर: SWE-bench Verified पर 62→87% का मतलब है कि अधिकांश genuinely-hard real-world Python tasks अब scope में हैं; जब model 62% पर था तब engineered किए prompt patterns अब possibly over-fitted हैं और आपको slow कर रहे हैं — सबसे simple scaffolding के साथ अपने खुद के eval set को Opus 4.7 के against rerun करें और अपने current production prompts से compare करें। Capability Curve framing year-ahead bet है: अपना code इतना thin रखें कि model better होना ही upgrade path हो।

Code With Claude: Managed Agents, cron पर routines, SWE-bench 62 से 87% पर कूदा

और समाचार