NVIDIA opens MRC multipath RDMA via OCP — used by OpenAI, Microsoft, Oracle

NVIDIA released the Multipath Reliable Connection (MRC) protocol as an open specification through the Open Compute Project today, after running it in production on Spectrum-X Ethernet hardware. MRC is a new RDMA transport that lets a single connection distribute traffic across multiple network paths — improving throughput, load balancing, and availability for large-scale AI training fabrics. The structural news: NVIDIA's making the protocol open rather than keeping it proprietary, which means non-NVIDIA fabric vendors can implement compatible silicon and switches. OpenAI, Microsoft (Fairwater datacenter), and Oracle (OCI Abilene) are cited as production users, with OpenAI specifically calling out that MRC "enabled us to avoid much of the typical network-related slowdowns" at training scale. No new hardware SKU — runs on existing ConnectX SuperNICs and Spectrum-X switches.

The mechanism is what matters for builders running large training. Standard RDMA over Ethernet (RoCEv2) puts a single connection through one network path; if the path congests or fails, the connection stalls until timeout-driven retransmission catches up. At gigascale training where collective operations involve thousands of GPUs all communicating simultaneously, single-path RDMA hits congestion repeatedly, and timeout-based recovery is too slow — you lose minutes per incident, multiplied by the frequency of network hiccups across a 100,000-GPU fabric. MRC distributes a single RDMA connection across multiple paths in parallel, hardware-accelerates failover in microseconds, dynamically avoids congested paths, and intelligently retransmits without falling back to TCP-style timeouts. The OpenAI testimonial maps to a known training-economics line item: every minute of network stall at multi-thousand-GPU scale is hundreds of dollars wasted; MRC is the protocol that makes that minute milliseconds.

The ecosystem read pairs with this week's two earlier infra pieces. Astera Labs Scorpio is the open memory-semantic fabric switch built for non-NVIDIA training stacks (UALink-aligned). Google TPU 8th gen is the vertically-integrated alternative (training silicon + inference silicon + Boardfly topology, all designed together). NVIDIA's MRC sits in the middle: NVIDIA hardware is required to get the silicon-accelerated multipath performance, but the protocol itself is now open and other vendors can implement it. The strategic read is NVIDIA conceding that closed protocols at the fabric layer are slowing adoption — hyperscaler customers want optionality even when they're committed to NVIDIA at the GPU layer. Open-spec protocol + Spectrum-X-only acceleration is the same playbook NVIDIA ran with NVLink (open spec, NVIDIA-only chips initially) — and now there's competitive pressure to do the same at fabric layer. For builders, the practical implication is that gigascale training fabrics are converging on multipath RDMA as the standard primitive, regardless of which silicon vendor runs the actual switches. The fabric-layer compatibility story just got better.

Practical move: if you operate AI training infrastructure at multi-thousand-GPU scale, MRC support should be on your roadmap evaluation criteria for any fabric purchase this year. The OCP spec means you can evaluate compatible silicon from non-NVIDIA vendors as it ships. If you're a smaller training shop (sub-1000 GPUs), single-path RDMA is still adequate — multipath complexity doesn't pay off until network-path failures actually hit your collective operations frequently enough to matter. For neoclouds and hyperscalers building AI compute capacity, MRC-compatible silicon is now a procurement question, not just a NVIDIA Spectrum-X question. The OpenAI/Microsoft/Oracle deployment names imply the protocol has been hardened in production environments at the largest current scale — that derisks the technology meaningfully versus an early-stage open spec. The watch: which non-NVIDIA fabric vendors implement MRC first, and whether silicon-level acceleration is achievable on Astera-class switches or requires NVIDIA-tier hardware integration.

NVIDIA opens MRC multipath RDMA via OCP — used by OpenAI, Microsoft, Oracle

More News