NVIDIA delivered the first Vera CPUs to Anthropic (San Francisco), OpenAI (Mission Bay), SpaceXAI (Palo Alto), and Oracle Cloud Infrastructure (Santa Clara) between May 17-20, hand-delivered by VP Ian Buck. Vera is NVIDIA's first CPU positioned as "built for agents" — 88 custom Olympus cores, 1.2 TB/s memory bandwidth, 50% faster per-core under full load, second-generation NVLink-C2C interconnect to pair with Rubin GPUs in the Vera Rubin NVL72 reference system. Recipients named: James Bradbury at Anthropic, Sachin Katti at OpenAI. Oracle is the first hyperscale cloud deployment. NVIDIA hasn't disclosed pricing or general-availability timeline.
The "built for agents" framing is the architectural choice that matters. Past NVIDIA host CPUs (Grace) targeted general HPC/AI workloads — fast CPU sitting next to fast GPU, mostly data-movement and orchestration. Vera is sized specifically for what agentic systems do alongside the model: tool-call execution (Python code generation that needs to run somewhere), reinforcement learning loops, agent sandboxing, long-context state management. Buck's quote captures it: "models actually have to generate some Python code to arrive at the correct answer." The CPU is now the workhorse for everything the model emits that gets executed, not just glue between GPU and storage. 88 cores and 1.2 TB/s memory bandwidth puts Vera at HPC-class density for a host CPU — higher than typical server CPUs, lower than GPU but optimized for the sequential and memory-bandwidth-bound agent workloads that flank inference.
Position this in the AI hardware stack of May 2026. NVIDIA shipped the NVFP4 4-bit pretraining methodology earlier this month (the GPU-side compute story). Vera is the CPU-side complement. The Vera Rubin NVL72 reference system pairs both. Strategic move: NVIDIA is closing the "everything but the model" loop — the agentic workload that runs adjacent to inference is now NVIDIA silicon end-to-end. AMD MI300A and Intel Granite Rapids server CPUs are the closest competitors, but neither was designed with agent workloads as the central use case. For builders running production agent systems on cloud, Oracle being the first hyperscale deployment matters: AWS, GCP, Azure deployments aren't named yet. Watch their announcements over the next quarter.
Monday: if you're not in the four-recipient group, Vera isn't shipping to you in the next quarter — this is initial sampling to top labs. The takeaway is what this tells you about the next generation of cloud agent infrastructure: Oracle will offer Vera Rubin NVL72 instances before AWS/GCP/Azure, by some margin. If your agent workload is bottlenecked on CPU-side execution (tool calls, RL inner loops, sandboxed code execution), the relative cost of those operations on Vera versus current Grace or x86 host CPUs is the next benchmark to track. NVIDIA hasn't published Vera-vs-x86 numbers yet. The deeper bet: hardware architecture is now optimizing for "the agent stack around the model," not just "the model itself." That's a meaningful shift if Vera's design choices replicate downstream.
