Zubnet AIApprendreWiki › Data Centers
Infrastructure

Data Centers

Aussi connu sous: AI Data Centers, GPU Clusters
Des facilités physiques qui hébergent les serveurs, GPU, équipement de networking et systèmes de refroidissement nécessaires pour entraîner et faire tourner des modèles IA. Les data centers IA modernes sont purpose-built pour du calcul parallèle massif, consommant des mégawatts de puissance et demandant un refroidissement spécialisé. Un seul run d'entraînement de modèle de frontière peut occuper des milliers de GPU à travers une facilité entière pendant des mois.

Pourquoi c'est important

Les data centers sont les usines de l'ère IA. Chaque requête à Claude, chaque image de Midjourney, chaque vidéo de Runway tourne sur du hardware assis dans un de ces bâtiments. La pénurie globale de capacité de data center AI-ready est une des plus grosses contraintes sur la croissance IA — et une des plus grosses opportunités d'investissement.

Deep Dive

An AI data center is not just a bigger version of a traditional server farm. The fundamental constraint has shifted from compute density to power density. A standard enterprise rack consumes 7–10 kilowatts; a rack loaded with eight NVIDIA H100 GPUs draws 40–70 kW, and next-generation GB200 NVL72 racks push past 120 kW. This means an AI data center with the same floor space as a conventional facility might need 5–10 times the electrical capacity. Securing that much power — often 100+ megawatts per facility — has become the primary bottleneck, which is why companies like Microsoft, Amazon, and Google are signing deals with nuclear plants, exploring small modular reactors, and reviving decommissioned power stations just to feed their GPU clusters.

The Cooling Challenge

Traditional air cooling simply cannot handle modern AI workloads. When you pack thousands of GPUs drawing 700 watts each into a confined space, the heat output is staggering — a single H100 server produces roughly the same thermal load as a space heater running full blast. This has pushed the industry toward liquid cooling at unprecedented speed. Direct-to-chip liquid cooling, where coolant flows through cold plates mounted directly on the GPU, is now standard in new AI facilities. Some operators are going further with full immersion cooling, submerging entire servers in dielectric fluid. NVIDIA's GB200 systems essentially require liquid cooling — there's no practical air-cooled configuration. This shift has massive implications for existing data centers: retrofitting a facility designed for air cooling to support liquid cooling often means ripping out raised floors, adding plumbing infrastructure, and upgrading the building's structural capacity to handle the weight of coolant-filled systems.

Networking Inside the Building

The network fabric inside an AI data center is where the real engineering complexity lives. When 10,000 GPUs need to synchronize gradient updates during a training run, the interconnect has to deliver massive bandwidth with minimal latency and near-zero packet loss. InfiniBand, originally developed for high-performance computing, dominates AI training clusters because it offers 400 Gb/s per port (with 800 Gb/s NDR arriving in production) and features like RDMA that bypass the CPU entirely for data transfers. Ethernet is catching up — Ultra Ethernet Consortium and NVIDIA's Spectrum-X are pushing 800 GbE with RoCE (RDMA over Converged Ethernet) — but InfiniBand remains the default for serious training workloads. The network topology matters too: fat-tree and rail-optimized designs ensure that any GPU can communicate with any other GPU at full bandwidth, which is critical when your parallelism strategy splits a model across hundreds of nodes.

Geography and Strategy

Where you build an AI data center is a strategic decision driven by power availability, climate, fiber connectivity, and increasingly, geopolitics. Northern Virginia (Ashburn corridor) hosts the densest concentration of data centers on Earth, but power constraints are pushing new builds to places like central Texas, the Nordic countries, and the Middle East. Cold climates reduce cooling costs — Meta's data center in Luleå, Sweden uses outside air for cooling most of the year. Cheap hydroelectric power draws facilities to Québec and the Pacific Northwest. Meanwhile, sovereign AI initiatives are driving countries like Saudi Arabia, the UAE, and India to build domestic GPU clusters so they don't depend on American hyperscalers for AI capacity. The result is a global buildout estimated at over $300 billion through 2027, making AI data centers one of the largest infrastructure investments in history.

Concepts liés

← Tous les termes
← Data Augmentation Dataset →
ESC