LM Studio and Apple run a trillion-parameter model on four Mac Studios, Zubnet AI News

You can now run a trillion-parameter AI model without touching the cloud, on a cluster of Macs sitting on a desk. LM Studio said it worked with Apple to run Kimi K2.6, Moonshot's open-weight model with roughly a trillion parameters, across four Mac Studios linked together, with secure remote access, using a preview build of its software. The demonstration, shown around Apple's WWDC 2026, is a marker of how far local, on-premises inference of frontier-scale models has come.

The setup leans on two things Apple has been quietly building toward. The first is memory: four Mac Studios joined over Thunderbolt 5 pool into roughly 1.5 terabytes of unified memory, enough to hold a trillion-parameter model's weights that would otherwise demand a rack of datacenter GPUs. The second is a new capability, RDMA over Thunderbolt 5 in macOS, which lets the machines move data between each other fast enough to behave as one. Reported throughput for Kimi K2 on such a cluster lands around 25 tokens per second, usable for real work, at a hardware cost of roughly $40,000, which is a lot for an individual and very little next to the equivalent GPU server.

For Apple, this is a positioning move. At WWDC 2026 it pitched the Mac Studio as a serious local-AI workstation, citing large gains in token generation on its newest chips running models through LM Studio. For the open-weights world, it is something bigger: the fact that a frontier-size model like Kimi K2.6 ships with open weights at all is what makes running it on your own hardware possible. Closed models from the big labs cannot be downloaded onto a desk, open ones can, and that difference is now the difference between renting intelligence and owning the machine that runs it.

The significance connects to the cost story playing out everywhere else in AI. Cloud inference is metered, and the bill scales with how much you use it; a model running locally has a fixed, upfront cost and no per-token meter at all. For privacy-sensitive or high-volume work, that math is starting to favor the desk. The honest caveats: 25 tokens per second is fine for a single user but not for serving many, $40,000 is a real barrier, and vendor throughput claims should be read with the usual skepticism. But the direction is hard to miss. The frontier used to live only in datacenters, and a trillion parameters now fits, slowly but really, on a cluster of computers you can buy and unplug.

LM Studio and Apple run a trillion-parameter model on four Mac Studios

More News