Code With Claude: Managed Agents, cron routines, SWE-bench jumps 62 to 87%, Zubnet AI News

Anthropic ran Code With Claude this week and shipped three things that change what Claude-based builders maintain themselves. Managed Agents are native primitives for sandboxed code execution, checkpointing, and credential scoping. Proactive Workflows are Claude Code routines that fire on cron schedules, GitHub webhooks, or API endpoints — Auto mode with destructive-action screening and prompt-injection detection, plus worktrees for isolated branch management. The Capability Curve is the framing: SWE-bench Verified moved from 62% (Sonnet 3.7, a year ago) to 87% (Opus 4.7, now). PM demos by Jess Yan and Lance Martin; Alex Albert presented the curve. Docs are already live at platform.claude.com/docs/en/managed-agents/overview. A redesigned desktop GUI with split views and inline diff comments shipped alongside, with a Rubber Duck critic running after planning, after implementation, and before tests.

Mechanically: Managed Agents is Anthropic's bid to own the agent-infrastructure layer the same way they own the model layer. Sandboxed execution + checkpointing + credential scoping are exactly the primitives LangGraph, OpenAI Agents SDK, AutoGen, and the wrapper ecosystem sell. Native primitives mean, for Claude-first stacks, you no longer need LangGraph-style state management bolted on top. Proactive Workflows brings Claude Code into cron/webhook/API-trigger territory; what previously meant "wrap claude in a shell script and a systemd timer" is now a first-class routine with safety screens built in. The Capability Curve number is the strategic message: +25 points on SWE-bench Verified in twelve months. Albert's framing — "set expectations" — is Anthropic betting the model improves fast enough that elaborate agentic scaffolding becomes the wrong place to invest engineering time.

Ecosystem effect: this is the same move OpenAI made with the Agents SDK and Assistants API — primitives pulled up the stack, wrapper surface area shrinks. LangChain, LangGraph, CrewAI, AutoGen, and similar Claude-first wrappers lose ground proportional to how much state and credential plumbing they previously owned. The interesting demarcation is at the protocol layer: MCP keeps the agent-tools dimension open and cross-vendor while Managed Agents owns the execution dimension. Proactive Workflows + Worktrees + Auto mode + the Rubber Duck critic means Claude Code is now positioned as a first-class CI/CD agent runtime, not just a coding assistant — the same niche Cursor's background-agent track and OpenAI's Codex-in-cloud are aiming at. Cross-vendor agent orchestration (route between Claude/Gemini/OpenAI) is still a wrapper-ecosystem play; single-vendor Claude stacks get the most direct benefit.

Monday: if you run Claude Code in custom cron or CI setups, port to Routines this week — fewer moving parts, and you inherit Auto mode's destructive-action screen and prompt-injection detection for free. If you're building agent products on top of LangGraph or AutoGen with Claude as primary model, audit how much state management is now duplicated by Managed Agents (sandboxed exec, checkpointing). On the eval side: 62→87% on SWE-bench Verified means most genuinely-hard real-world Python tasks are now in scope; prompt patterns engineered when the model was at 62% are likely over-fitted and slowing you down — rerun your own eval set against Opus 4.7 with the simplest possible scaffolding and compare against your current production prompts. The Capability Curve framing is the year-ahead bet: keep your code thin enough that the model getting better is the upgrade path.

Code With Claude: Managed Agents, cron routines, SWE-bench jumps 62 to 87%

More News