Zubnet AI學習Wiki › Edge AI
基礎設施

Edge AI

On-Device AI, Local AI
把 AI 模型直接跑在終端使用者裝置上 — 手機、筆電、IoT 感測器、汽車 — 而不是在雲端。Edge AI 意味著你的資料永遠不離開裝置、延遲接近零(沒有網路往返)、模型離線可用。Apple Intelligence、Google 的 on-device Gemini Nano、以及 llama.cpp、Ollama 這類本地 LLM 執行器都是 Edge AI。

為什麼重要

Edge AI 是隱私、延遲、成本交叉的地方。雲 AI 意味著把你的資料發到別人的伺服器、等回應、按 token 付費。Edge AI 意味著瞬間、私有、下載後免費的推理。取捨是模型大小:邊緣裝置記憶體有限,所以裝置端模型比雲模型更小、能力更弱。但對很多任務,手機上一個快的 3B 模型勝過資料中心裡慢的 400B 模型。

Deep Dive

The key constraint for edge AI is memory. A phone might have 6–12 GB of RAM shared between the OS, apps, and the model. A laptop might have 8–32 GB. This limits model size: a 3B parameter model at 4-bit quantization needs about 1.5 GB, feasible on a phone. A 7B model needs about 4 GB, feasible on a decent laptop. Anything larger requires aggressive quantization or offloading to disk (slow).

The Apple Silicon Effect

Apple's M-series chips (M1–M4) with unified memory architecture made edge AI practical for laptops. Unlike discrete GPU setups where model weights must fit in VRAM, Apple Silicon shares memory between CPU and GPU, so a MacBook with 32 GB unified memory can run a 24B model at 4-bit quantization smoothly. This, combined with llama.cpp's Metal optimization, created the local LLM movement.

Beyond Text

Edge AI isn't limited to language models. On-device speech recognition (Whisper), image classification, real-time translation, and predictive text all run locally. The trend is toward NPUs (Neural Processing Units) — dedicated AI accelerator chips built into phones and laptops that handle AI workloads more efficiently than general-purpose CPU/GPU. Apple's Neural Engine, Qualcomm's Hexagon, and Intel's NPU are all examples.

相關概念

← 所有術語
← Dual Use ElevenLabs →