Apple's approach is layered: a ~3B parameter on-device model handles quick tasks (smart replies, text editing, basic summarization) entirely on the device's Neural Engine. More complex tasks go to Apple's Private Cloud Compute — servers running Apple Silicon that process requests without retaining user data and are subject to independent security audits. Tasks beyond Apple's capabilities (like deep research questions) can be routed to third-party models with explicit user permission.
Apple's privacy architecture for cloud AI is technically ambitious: servers run on Apple Silicon (same architecture as devices), the software is published for independent verification, requests are encrypted end-to-end, and Apple claims no ability to access user data even on their own servers. This is a meaningfully different privacy model than "trust us with your data" — it's "verify that we can't see your data." Whether it fully delivers on this promise is subject to ongoing security research.