Ollama: Definition & Meaning — AI Wiki

用單條指令在本地執行語言模型的友好工具。Ollama 把 llama.cpp 包裝成類似 Docker 的體驗:ollama run llama3 下載並執行 Llama 3,自動為你的硬體選對的量化。它管理模型下載,提供 API 伺服器,處理硬體偵測。

為什麼重要

Ollama 對本地 AI 就像 Docker 對容器化:它移除了阻力。在 Ollama 之前,跑本地模型意味著選量化級別、下載 GGUF 檔案、設定 llama.cpp 參數、管理 GPU offload。Ollama 自動處理所有這些。從「我想試著在本地跑 AI」到真正做到,它是最快的路徑。

Deep Dive

Ollama maintains a registry of models (similar to Docker Hub) where popular models are available in pre-configured quantizations. Running ollama pull mistral downloads Mistral-7B at a reasonable quantization for your system. The tool detects your hardware (CPU, Apple Silicon, NVIDIA GPU) and configures inference accordingly. It exposes an HTTP API on localhost:11434 that's compatible with many AI tools and frameworks.

Modelfile

Ollama's "Modelfile" concept lets you customize models by specifying a base model, system prompt, temperature, and other parameters — like a Dockerfile for AI models. You can create custom variants: ollama create my-assistant -f Modelfile. This makes it easy to experiment with different system prompts and parameters without touching model weights.

The Local AI Stack

Ollama is typically one layer in a local AI stack: Ollama for model serving, Open WebUI for a chat interface, and various tools that connect via the API (Continue for IDE integration, LangChain for application frameworks). This stack gives you a fully private, cost-free AI setup that runs entirely on your hardware. For privacy-sensitive applications and development work, it's increasingly competitive with cloud APIs.

Ollama

為什麼重要

Deep Dive

Modelfile

The Local AI Stack

相關概念