Ollama: Definition & Meaning — AI Wiki

用单条命令在本地运行语言模型的友好工具。Ollama 把 llama.cpp 包装成类似 Docker 的体验:ollama run llama3 下载并运行 Llama 3,自动为你的硬件选对的量化。它管理模型下载,提供 API 服务器,处理硬件检测。

为什么重要

Ollama 对本地 AI 就像 Docker 对容器化:它移除了阻力。在 Ollama 之前,跑本地模型意味着选量化级别、下载 GGUF 文件、配置 llama.cpp 参数、管理 GPU offload。Ollama 自动处理所有这些。从“我想试着在本地跑 AI”到真正做到,它是最快的路径。

Deep Dive

Ollama maintains a registry of models (similar to Docker Hub) where popular models are available in pre-configured quantizations. Running ollama pull mistral downloads Mistral-7B at a reasonable quantization for your system. The tool detects your hardware (CPU, Apple Silicon, NVIDIA GPU) and configures inference accordingly. It exposes an HTTP API on localhost:11434 that's compatible with many AI tools and frameworks.

Modelfile

Ollama's "Modelfile" concept lets you customize models by specifying a base model, system prompt, temperature, and other parameters — like a Dockerfile for AI models. You can create custom variants: ollama create my-assistant -f Modelfile. This makes it easy to experiment with different system prompts and parameters without touching model weights.

The Local AI Stack

Ollama is typically one layer in a local AI stack: Ollama for model serving, Open WebUI for a chat interface, and various tools that connect via the API (Continue for IDE integration, LangChain for application frameworks). This stack gives you a fully private, cost-free AI setup that runs entirely on your hardware. For privacy-sensitive applications and development work, it's increasingly competitive with cloud APIs.

Ollama

为什么重要

Deep Dive

Modelfile

The Local AI Stack

相关概念