Apple is making a concentrated bet on artificial intelligence that runs on your device rather than in a data center, and two threads from its WWDC announcements show how deliberate it is. One is a new developer framework called Core AI. The other is a quieter detail about how the next Siri actually uses Google's Gemini, and together they describe a company trying to own the models on its hardware while borrowing a rival's only to teach its own.

Core AI is the successor to Core ML, Apple's long-standing on-device machine-learning framework, and it is built for the generative era. It lets apps run large language models and generative AI entirely on-device, with no server dependency and no token costs, supporting both custom PyTorch models and pre-optimized open-source ones through a memory-safe Swift API. Apple says it spans a wide range, from compact 3-billion-parameter vision models to large reasoning models of up to 70-billion-parameters, with ahead-of-time compilation for instant load times and generative-AI optimizations like KV-cache management, autoregressive decoding, and Metal 4 kernels purpose-built for attention. It runs across iPhone, iPad, Mac, and Apple Vision Pro, ships now in the Xcode 27 beta for developers, and is due in production releases in the fall.

The Siri detail is subtler and, in some ways, more telling. According to an analysis of what the keynote left unsaid, Apple uses Gemini as a teacher rather than an engine. Gemini generates training data and learning signals that are distilled into Apple's own third-generation Foundation Models, a step that happens once during development, while the models that actually answer your requests run on the device. Gemini is reached in the cloud only as a fallback, for the minority of requests that exceed what the on-device model can handle.

That distinction is the whole point. A training-time teacher is a one-time and reversible dependency, the kind Apple could swap out or wean off later, whereas a runtime engine would be structural lock-in that touches privacy, latency, and the cost of every single query. Read that way, the headline that Apple now depends on Gemini overstates things: it is a hierarchy of concessions rather than a capitulation, with Apple keeping the part that matters most, on-device inference on its own models, and ceding only a cloud fallback. The arrangement comes without disclosed figures, though The Information has reported, without Apple confirming it, that some of that cloud inference may run on Nvidia B200 chips inside Google data centers.

The reason this is worth watching is the direction it points. Running models from a few billion parameters up to 70 billion locally, at zero token cost, and training them by distilling knowledge out of bigger frontier models, is one of the most consequential bets in AI right now, because it pulls capability back onto the device and out of the metered cloud. Apple has the silicon and the scale to push it further than almost anyone. The honest caveats are that production does not arrive until the fall, real-world performance of large on-device models is the open question, and teacher-not-engine is partly Apple's own framing of a relationship it would rather downplay. But owned, on-device models taught by distillation is exactly where a lot of the interesting work is heading, and Apple just gave developers the framework to build on it.