PyTorch 2.11 ने CUDA aarch64 default PyPI पर डाला: vLLM-on-Grace अब plain pip, Zubnet AI समाचार

अगर आप Grace, Grace Hopper, या Grace Blackwell boxes को target करने वाली Dockerfiles या CI maintain करते हैं, आप `--index-url https://download.pytorch.org/whl/cu128` workaround अब drop कर सकते हैं: PyTorch 2.11.0 (अप्रैल 2026) CUDA-enabled GPU wheels को default PyPI index पर publish करता है aarch64 Linux के लिए। 2.11.0 से पहले, aarch64 पर `pip install torch` silently CPU-only wheels pull करता था; transitive deps subtle ways में GPU detection break कर सकती थीं। Fix packaging है, kernels नहीं — लेकिन ARM-host inference boxes पर vLLM चलाने वाले किसी भी व्यक्ति के लिए, यह एक chronic source of "CUDA available क्यों नहीं" debugging को collapse करता है।

Mechanism NVIDIA/Astral Wheel Variants standard है, जो PyPI को एक ही package name के तहत architecture/accelerator-specific builds को distinguish करने देता है। PyTorch का implementation NCCL और cuBLAS के लिए dynamic linking use करता है static bundling के बजाय — यही है जो wheels को PyPI पर live करने के लिए छोटा रखता है। Named supported host platforms: GB200, GB300, GH200 (Grace Blackwell और Grace Hopper systems)। vLLM ने interim workarounds carry किए थे (`use_existing_torch.py` जो torch को install files से strip करता है; pyproject.toml में `[tool.uv] no-build-isolation-package = ["torch"]`)। दोनों custom/nightly torch builds के लिए useful remain करते हैं लेकिन stock installs के लिए mandatory होना बंद कर देते हैं।

Wider stack implications। Grace Hopper / Grace Blackwell — और अब Vera, NVIDIA का agent-optimized 88-core CPU जो Rubin GPUs के साथ paired है — सब ARM-host plus NVIDIA GPU topologies हैं। ये Vera Rubin NVL72 reference design के पीछे के systems हैं और Oracle Cloud, CoreWeave, Lambda, Nebius, और similar operators द्वारा offered GH200/GB200 instances के पीछे के systems हैं। 2.11 तक, ARM-host AI dev का मतलब था हर install script में एक branch जो PyPI index swap करना जानती हो। यह branch अब optional है। PyTorch specifically के परे, Wheel Variants वो standard है जो broader GPU Python ecosystem को "architecture × accelerator" को first-class packaging dimension के रूप में model करने देता है ad-hoc index URLs के बजाय। JAX, CuPy, Triton, और others द्वारा adoption longer-running story है track करने के लिए।

सोमवार: अपने Grace/GH200/GB200 builds में `torch>=2.11.0` bump करें और index-url override remove करें। अगर आप torch nightlies या custom builds पर depend करते हैं, vLLM workarounds keep रखें — वे अभी भी कुछ देते हैं। Longer-term action: Python GPU stack में Wheel Variants adoption watch करें। जब JAX/CuPy/Triton same standard पर ship करते हैं, आपकी install logic में x86-vs-aarch64 branching entirely disappear हो जाती है। Vera Rubin NVL72-class hardware पर deployments इस साल बाद में plan करने वाली teams के लिए, यह developer-experience plumbing का पहला piece है stable में land होने वाला। ARM-host inference के लिए kernel-level perf story separate है और अभी maturing — लेकिन install-it-and-go problem अब solved है।

PyTorch 2.11 ने CUDA aarch64 default PyPI पर डाला: vLLM-on-Grace अब plain pip

और समाचार