Zubnet AI学习Wiki › Edge AI
基础设施

Edge AI

On-Device AI, Local AI
把 AI 模型直接跑在终端用户设备上 — 手机、笔记本、IoT 传感器、汽车 — 而不是在云端。Edge AI 意味着你的数据永远不离开设备、延迟接近零(没有网络往返)、模型离线可用。Apple Intelligence、Google 的 on-device Gemini Nano、以及 llama.cpp、Ollama 这类本地 LLM 运行器都是 Edge AI。

为什么重要

Edge AI 是隐私、延迟、成本交叉的地方。云 AI 意味着把你的数据发到别人的服务器、等响应、按 token 付费。Edge AI 意味着瞬间、私有、下载后免费的推理。权衡是模型大小:边缘设备内存有限,所以设备端模型比云模型更小、能力更弱。但对很多任务,手机上一个快的 3B 模型胜过数据中心里慢的 400B 模型。

Deep Dive

The key constraint for edge AI is memory. A phone might have 6–12 GB of RAM shared between the OS, apps, and the model. A laptop might have 8–32 GB. This limits model size: a 3B parameter model at 4-bit quantization needs about 1.5 GB, feasible on a phone. A 7B model needs about 4 GB, feasible on a decent laptop. Anything larger requires aggressive quantization or offloading to disk (slow).

The Apple Silicon Effect

Apple's M-series chips (M1–M4) with unified memory architecture made edge AI practical for laptops. Unlike discrete GPU setups where model weights must fit in VRAM, Apple Silicon shares memory between CPU and GPU, so a MacBook with 32 GB unified memory can run a 24B model at 4-bit quantization smoothly. This, combined with llama.cpp's Metal optimization, created the local LLM movement.

Beyond Text

Edge AI isn't limited to language models. On-device speech recognition (Whisper), image classification, real-time translation, and predictive text all run locally. The trend is toward NPUs (Neural Processing Units) — dedicated AI accelerator chips built into phones and laptops that handle AI workloads more efficiently than general-purpose CPU/GPU. Apple's Neural Engine, Qualcomm's Hexagon, and Intel's NPU are all examples.

相关概念

← 所有术语
← Dual Use ElevenLabs →