TechCrunch reported Friday that Meta has signed a deal for millions of Amazon Web Services Graviton CPUs, specifically framed as capacity for agentic AI workloads rather than model training or inference. The deal adds to Meta's February 2026 agreement with Nvidia for standalone Grace CPUs, which explicitly unbundled the CPU from the GPU in Meta's infrastructure roadmap. The Graviton demand side is corroborated by separate reporting that two large AWS customers tried this year to buy out the entirety of AWS's 2026 Graviton instance capacity. AWS refused, citing other customers' needs. The chip industry's attention has been on GPUs for three years. The story for the next two is going to be CPUs.

The technical reason is mechanical. A model forward pass runs on GPUs. Everything else in an agentic workflow runs on CPUs. That includes prompt assembly, tool invocation, result parsing, state tracking across multi-step reasoning chains, orchestration between tool calls, retry logic, logging, and the glue code that ties a model's outputs into whatever action the agent needs to take next. A single agentic task that takes a minute of user wall-clock time may involve hundreds of CPU-seconds of orchestration for every GPU-second of inference. As agents become the dominant LLM deployment pattern, that ratio moves the bottleneck from matmul throughput to CPU core count and single-thread latency. Graviton cores are ARM-based, cache-heavy, and priced well below equivalent Xeon or EPYC; they are exactly the workload profile agent orchestration wants.

The commercial picture fits. AWS has deployed 1.4 million Trainium chips as of March 2026 with 500,000 Trainium2 concentrated in Project Rainier, and the Graviton5 generation launched recently at 192 cores with 180MB of L3 cache. Meta is simultaneously running Nvidia Grace (February 2026 deal), AWS Graviton (this week), Broadcom custom silicon (April 2026 extension for custom AI processors), and its own MTIA internal accelerators. That diversification is the tell. Meta is not betting on any single CPU vendor because the competitive dynamic between Grace, Graviton, EPYC, Xeon, and hyperscaler-custom silicon is still open, and Meta does not want to be cornered by a single supplier when inference and orchestration volumes grow another 10x. Amazon's position in this picture is unusual because it sells capacity to both direct competitors and to Anthropic, which itself just took $25B from Amazon with a cloud spend commitment attached.

For builders, the practical read is simple. If you are architecting an agentic system, the cost model shifts. GPU inference is still the most expensive per-token, but CPU orchestration time can dominate total cost-of-goods as you add tool calls, retries, and complex state machines. Benchmarking on a CPU-rich instance against a GPU-biased one becomes worth doing rather than assuming. Second, the inference-provider landscape will continue to shift toward vendors with CPU capacity alongside GPU capacity; pure GPU-focused shops like CoreWeave and Lambda have historically optimized for training throughput but are building out CPU capacity now specifically because agent workloads need it. Third, if your application is bottlenecked by agent orchestration, you likely have more room on the CPU axis than on the GPU axis for optimization, because ARM-based cloud CPUs have gotten cheap quickly. The AI infrastructure story in 2026 is no longer about who has the most H100s. It is about who has built the silicon and the scheduling software to run agents at scale, and that is a different shape of question.