Ollama: Definition & Meaning — AI Wiki

Uma ferramenta amigável ao usuário para rodar modelos de linguagem localmente com um único comando. Ollama empacota o llama.cpp em uma experiência tipo Docker: ollama run llama3 baixa e roda o Llama 3, selecionando automaticamente a quantização certa para seu hardware. Lida com downloads de modelo, provê um servidor API e lida com detecção de hardware.

Por que importa

Ollama é para IA local o que Docker é para containerização: ele removeu a fricção. Antes do Ollama, rodar um modelo local significava escolher níveis de quantização, baixar arquivos GGUF, configurar flags do llama.cpp e gerenciar offloading de GPU. Ollama lida com tudo isso automaticamente. É o caminho mais rápido de “quero experimentar rodar IA localmente” para realmente fazer isso.

Deep Dive

Ollama maintains a registry of models (similar to Docker Hub) where popular models are available in pre-configured quantizations. Running ollama pull mistral downloads Mistral-7B at a reasonable quantization for your system. The tool detects your hardware (CPU, Apple Silicon, NVIDIA GPU) and configures inference accordingly. It exposes an HTTP API on localhost:11434 that's compatible with many AI tools and frameworks.

Modelfile

Ollama's "Modelfile" concept lets you customize models by specifying a base model, system prompt, temperature, and other parameters — like a Dockerfile for AI models. You can create custom variants: ollama create my-assistant -f Modelfile. This makes it easy to experiment with different system prompts and parameters without touching model weights.

The Local AI Stack

Ollama is typically one layer in a local AI stack: Ollama for model serving, Open WebUI for a chat interface, and various tools that connect via the API (Continue for IDE integration, LangChain for application frameworks). This stack gives you a fully private, cost-free AI setup that runs entirely on your hardware. For privacy-sensitive applications and development work, it's increasingly competitive with cloud APIs.

Ollama

Por que importa

Deep Dive

Modelfile

The Local AI Stack

Conceitos relacionados