PyTorch 2.11 lands CUDA aarch64 on default PyPI: vLLM-on-Grace is plain pip now, Zubnet AI News

If you maintain Dockerfiles or CI that target Grace, Grace Hopper, or Grace Blackwell boxes, you can drop the `--index-url https://download.pytorch.org/whl/cu128` workaround now: PyTorch 2.11.0 (April 2026) publishes CUDA-enabled GPU wheels to the default PyPI index for aarch64 Linux. Before 2.11.0, `pip install torch` on aarch64 silently pulled CPU-only wheels; transitive deps could break GPU detection in subtle ways. The fix is packaging, not kernels — but for anyone running vLLM on ARM-host inference boxes, it collapses a chronic source of "why isn't CUDA available" debugging.

The mechanism is the NVIDIA/Astral Wheel Variants standard, which lets PyPI distinguish architecture/accelerator-specific builds under a single package name. PyTorch's implementation uses dynamic linking to NCCL and cuBLAS rather than static bundling — that's what keeps the wheels small enough to live on PyPI in the first place. Supported host platforms named: GB200, GB300, GH200 (Grace Blackwell and Grace Hopper systems). vLLM had carried interim workarounds (`use_existing_torch.py` stripping torch from install files; `[tool.uv] no-build-isolation-package = ["torch"]` in pyproject.toml). Both remain useful for custom/nightly torch builds but stop being mandatory for stock installs.

Wider stack implications. Grace Hopper / Grace Blackwell — and now Vera, NVIDIA's agent-optimized 88-core CPU paired with Rubin GPUs — are all ARM-host plus NVIDIA GPU topologies. They're the systems behind the Vera Rubin NVL72 reference design and behind the GH200/GB200 instances offered by Oracle Cloud, CoreWeave, Lambda, Nebius, and similar operators. Until 2.11, ARM-host AI dev meant a branch in every install script that knew to swap the PyPI index. That branch is now optional. Beyond PyTorch specifically, Wheel Variants is the standard that lets the broader GPU Python ecosystem model "architecture × accelerator" as a first-class packaging dimension rather than ad-hoc index URLs. Adoption by JAX, CuPy, Triton, and others is the longer-running story to track.

Monday: bump `torch>=2.11.0` in your Grace/GH200/GB200 builds and remove the index-url override. If you depend on torch nightlies or custom builds, keep the vLLM workarounds — they still buy you something. The longer-term action: watch Wheel Variants adoption across the Python GPU stack. When JAX/CuPy/Triton ship on the same standard, the x86-vs-aarch64 branching in your install logic disappears entirely. For teams planning deployments on Vera Rubin NVL72-class hardware later this year, this is the first piece of the developer-experience plumbing landing in stable. The kernel-level perf story for ARM-host inference is separate and still maturing — but the install-it-and-go problem is now solved.

PyTorch 2.11 lands CUDA aarch64 on default PyPI: vLLM-on-Grace is plain pip now

More News