AWS DevOps Agent hits GA with 94% root-cause accuracy, one week after Google's Auto-Diagnose paper

Amazon shipped DevOps Agent to general availability on April 17, 2026, the production release of an autonomous incident investigator that went into preview in December 2025. When a CloudWatch alarm, PagerDuty alert, Dynatrace problem, or ServiceNow ticket fires, the agent takes over without a human prompt: it correlates telemetry, traces dependencies across services, pulls recent deployment and code changes, and proposes a root cause. The launch lands one week after Google's Auto-Diagnose preprint, which used Gemini 2.5 Flash for integration-test log triage with 90.14% root-cause accuracy. Two major cloud vendors shipping LLM-powered SRE triage in the same week is the story, not either product in isolation.

Under the hood it is Amazon Bedrock AgentCore, which is AWS's agent runtime, not a bespoke model stack. The integration surface is broad on day one: CloudWatch, Datadog, Dynatrace, New Relic, Splunk, and Grafana on the observability side; GitHub, GitLab, and Azure DevOps on the code and CI-CD side; Azure and on-premises support added at GA. Model Context Protocol (MCP) is the extension mechanism for custom skills, which puts AWS's SRE agent and Anthropic's original MCP spec on the same standards track. Billing is per-second of agent runtime, AWS Support customers get monthly DevOps Agent credits scaled to support tier, and the launch regions are Northern Virginia, Ireland, Frankfurt, plus three others.

Preview metrics from AWS: up to 75% MTTR reduction and 94% root-cause accuracy. Compare to Auto-Diagnose at 90.14% on Google's test corpus, and the convergence is hard to ignore. Two different codebases, two different frontier models, two different target workloads (integration tests versus production incidents), landing inside 4 percentage points of each other. What this tells you: frontier models plus careful prompting plus structured telemetry plus a refusal-on-ambiguity rule are now the ceiling for this task. Neither vendor fine-tuned a custom model; both leaned on prompting discipline and tight integration. The difference that matters for builders is that AWS's agent is cross-vendor by design (it reads your Datadog and talks to your PagerDuty), while Google's is internal-only and not shipping as a product.

If you run on AWS and have real incident volume, the playbook flips overnight. The integration surface is the tools you already use, and per-second billing means you pay for actual agent runtime, not idle capacity. Two things to watch before trusting it in production. First, per-second pricing at full incident cadence: at 10-minute agent runs across a few hundred incidents a month, this is not the same as adding one more log pipeline. Second, the refusal behavior. Auto-Diagnose's hard anti-hallucination constraint was the single most important engineering choice keeping accuracy high. It is not obvious from AWS's GA announcement whether Bedrock AgentCore enforces the equivalent discipline, or whether it ships confident-sounding wrong answers when telemetry is thin. For builders not on AWS, the signal is that autonomous incident investigation is now a product category with two live vendors and a de-facto interop standard in MCP. Expect Azure to ship something equivalent within a quarter, and start rewriting runbooks into agent-legible formats now rather than later.

AWS DevOps Agent hits GA with 94% root-cause accuracy, one week after Google's Auto-Diagnose paper

More News