Google unveiled its eighth-generation TPUs at Cloud Next 2026 with the architectural shift that has been rumored for a year: splitting training and inference workloads into separate chips. TPU 8t is for training, TPU 8i for inference. Each is optimized for the specific bottlenecks of its half of the AI workload — training wants raw throughput and interconnect bandwidth across giant pods, inference wants latency and memory-access locality for autoregressive decoding.
TPU 8t pods are 9,600 chips, up from Ironwood's 9,216, connected through a 3D torus network. The architectural additions are SparseCore (acceleration for sparse ops, which dominate in MoE models) and native four-bit floating point (reducing memory bandwidth pressure and increasing effective throughput per memory byte). Google's claim is 2.7 times performance-per-dollar over Ironwood for large-scale training and 2 times performance-per-watt over the previous generation. Detailed FLOPS numbers and HBM specs are not public yet.
TPU 8i is the more interesting architectural move. Pod size tops out at 1,152 chips using a new interconnect topology called Boardfly ICI. The chip has three times the SRAM of Ironwood. The design choice is about keeping KV cache and activations on-chip for lower-latency autoregressive decoding. There is a Collectives Acceleration Engine specifically for the all-reduce and all-to-all patterns that dominate inference, and Boardfly reduces the hops required for all-to-all communication by up to 50%. Google's claim for the inference chip: 80% perf-per-dollar over Ironwood at low-latency targets, 2 times perf-per-watt over the previous generation.
Two things worth registering for builders. One, the training-versus-inference split at the silicon level is the hardware acknowledgment of what every LLM serving paper has been saying for two years: prefill and decode, training and serving, have different compute and memory profiles and benefit from different silicon. Anthropic's Amazon Trainium deal (1 million-plus chips deployed, 5 gigawatts over the decade) shows the same logic on Amazon silicon. Now Google is splitting the same way. Two, the Thinking Machines Lab multi-billion-dollar deal with Google Cloud the same week, for NVIDIA GB300 chips, is the consistent signal: Google sells its own silicon and NVIDIA's silicon through the same cloud, because customers want the option. Custom silicon is winning margin but not exclusivity yet.
