Zubnet AILearnWiki › Edge AI
Infrastructure

Edge AI

On-Device AI, Local AI
Running AI models directly on end-user devices — phones, laptops, IoT sensors, cars — rather than in the cloud. Edge AI means your data never leaves your device, latency is near-zero (no network round-trip), and the model works offline. Apple Intelligence, Google's on-device Gemini Nano, and local LLM runners like llama.cpp and Ollama are all edge AI.

Why it matters

Edge AI is where privacy, latency, and cost intersect. Cloud AI means sending your data to someone else's server, waiting for a response, and paying per token. Edge AI means instant, private, free-after-download inference. The trade-off is model size: edge devices have limited memory, so on-device models are smaller and less capable than cloud models. But for many tasks, a fast 3B model on your phone beats a slow 400B model in a data center.

Deep Dive

The key constraint for edge AI is memory. A phone might have 6–12 GB of RAM shared between the OS, apps, and the model. A laptop might have 8–32 GB. This limits model size: a 3B parameter model at 4-bit quantization needs about 1.5 GB, feasible on a phone. A 7B model needs about 4 GB, feasible on a decent laptop. Anything larger requires aggressive quantization or offloading to disk (slow).

The Apple Silicon Effect

Apple's M-series chips (M1–M4) with unified memory architecture made edge AI practical for laptops. Unlike discrete GPU setups where model weights must fit in VRAM, Apple Silicon shares memory between CPU and GPU, so a MacBook with 32 GB unified memory can run a 24B model at 4-bit quantization smoothly. This, combined with llama.cpp's Metal optimization, created the local LLM movement.

Beyond Text

Edge AI isn't limited to language models. On-device speech recognition (Whisper), image classification, real-time translation, and predictive text all run locally. The trend is toward NPUs (Neural Processing Units) — dedicated AI accelerator chips built into phones and laptops that handle AI workloads more efficiently than general-purpose CPU/GPU. Apple's Neural Engine, Qualcomm's Hexagon, and Intel's NPU are all examples.

Related Concepts

← All Terms
← Dual Use ElevenLabs →