Zubnet AIAprenderWiki › Edge AI
Infraestructura

Edge AI

On-Device AI, Local AI
Correr modelos IA directamente en dispositivos de usuario — teléfonos, laptops, sensores IoT, autos — en lugar de en la nube. Edge AI significa que tus datos nunca dejan tu dispositivo, la latencia es casi cero (no hay round-trip de red) y el modelo funciona offline. Apple Intelligence, Gemini Nano on-device de Google y runners LLM locales como llama.cpp y Ollama son todos Edge AI.

Por qué importa

Edge AI es donde privacidad, latencia y costo se cruzan. La IA cloud significa enviar tus datos al servidor de alguien más, esperar una respuesta y pagar por token. Edge AI significa inferencia instantánea, privada, gratis-después-de-descargar. El trade-off es el tamaño del modelo: los dispositivos edge tienen memoria limitada, así que los modelos on-device son más pequeños y menos capaces que los modelos cloud. Pero para muchas tareas, un modelo 3B rápido en tu teléfono vence a un modelo 400B lento en un data center.

Deep Dive

The key constraint for edge AI is memory. A phone might have 6–12 GB of RAM shared between the OS, apps, and the model. A laptop might have 8–32 GB. This limits model size: a 3B parameter model at 4-bit quantization needs about 1.5 GB, feasible on a phone. A 7B model needs about 4 GB, feasible on a decent laptop. Anything larger requires aggressive quantization or offloading to disk (slow).

The Apple Silicon Effect

Apple's M-series chips (M1–M4) with unified memory architecture made edge AI practical for laptops. Unlike discrete GPU setups where model weights must fit in VRAM, Apple Silicon shares memory between CPU and GPU, so a MacBook with 32 GB unified memory can run a 24B model at 4-bit quantization smoothly. This, combined with llama.cpp's Metal optimization, created the local LLM movement.

Beyond Text

Edge AI isn't limited to language models. On-device speech recognition (Whisper), image classification, real-time translation, and predictive text all run locally. The trend is toward NPUs (Neural Processing Units) — dedicated AI accelerator chips built into phones and laptops that handle AI workloads more efficiently than general-purpose CPU/GPU. Apple's Neural Engine, Qualcomm's Hexagon, and Intel's NPU are all examples.

Conceptos relacionados

← Todos los términos
← Dual Use ElevenLabs →