Zubnet AILearn › Wiki

AI Wiki

AI concepts explained by builders, not textbooks. No jargon walls. No academic gatekeeping. Just clear, practical definitions of the terms you'll actually encounter.

44 terms 7 categories Updated March 2026
No terms match your search.
A
Agent
AI Agent
Tools
An AI system that can autonomously plan and execute multi-step tasks, using tools (web search, code execution, API calls) to achieve a goal. Unlike a simple chatbot that answers one question at a time, an agent decides what to do next based on what it's learned so far.
Why it matters: Agents are the bridge between "AI that talks" and "AI that does." When your AI can browse docs, write code, and test it without you holding its hand at every step — that's an agent.
Alignment
Safety
The challenge of making AI systems behave in ways that match human values and intentions. An aligned model does what you mean, not just what you said — and avoids harmful actions even when not explicitly told not to.
Why it matters: A model that's technically brilliant but poorly aligned is like a genius employee who follows instructions too literally. Alignment research is why models refuse dangerous requests and try to be genuinely helpful.
API
Application Programming Interface
Infrastructure
A structured way for software to talk to other software. In AI, this usually means sending a request (your prompt) to a provider's server and getting a response (the model's output) back. REST APIs over HTTPS are the standard.
Why it matters: Every AI provider — Anthropic, Google, Mistral — exposes their models through APIs. If you're building anything with AI beyond a chat window, you're using an API.
Attention
Attention Mechanism, Self-Attention
Models
The core mechanism in Transformers that lets a model weigh which parts of the input are most relevant to each other. Instead of reading text left-to-right like older models, attention lets every word "look at" every other word simultaneously to understand context.
Why it matters: Attention is why modern LLMs understand that "bank" means different things in "river bank" vs. "bank account." It's also why longer context windows cost more — attention scales quadratically with sequence length.
B
Benchmark
Training
A standardized test used to evaluate and compare AI models. Benchmarks measure specific capabilities — reasoning (ARC), math (GSM8K), coding (HumanEval), general knowledge (MMLU) — and produce scores that can be compared across models.
Why it matters: Benchmarks are how the industry keeps score, but they're imperfect. Models can be trained to ace benchmarks without being genuinely better. Real-world performance often tells a different story. Treat them as signals, not truth.
Bias
Safety
Systematic patterns in AI outputs that reflect or amplify societal prejudices present in training data. Bias can appear in text generation, image creation, hiring tools, and anywhere models make decisions that affect people differently.
Why it matters: If the training data says nurses are women and engineers are men, the model will perpetuate that. Bias isn't always obvious — it hides in word associations, default assumptions, and who gets represented.
C
Chain of Thought
CoT
Using AI
A prompting technique where you ask the model to show its reasoning step by step before giving a final answer. Instead of jumping to a conclusion, the model "thinks out loud," which dramatically improves accuracy on complex tasks.
Why it matters: Asking "explain your reasoning" isn't just for transparency — it actually makes models smarter. CoT reduced math errors by up to 50% in early studies. Most modern models now do this internally.
Context Window
Context Length
Using AI
The maximum amount of text (measured in tokens) a model can process in a single conversation. This includes both your input and the model's output. If a model has a 200K context window, that's roughly 150,000 words — about two novels.
Why it matters: Context window size determines what you can do. Summarize a whole codebase? Needs big context. Quick question-answer? Small is fine. But bigger isn't always better — models can lose focus in very long contexts.
Corpus
Dataset, Training Data
Training
The body of text (or other data) used to train a model. A corpus can range from curated collections of books and papers to massive scrapes of the entire internet. The quality and composition of the corpus fundamentally shapes what the model knows and how it behaves.
Why it matters: Garbage in, garbage out. A model trained on Reddit talks differently than one trained on scientific papers. This is why we curated our own corpus for Sarah — generic web crawls produced confused, incoherent results.
D
Diffusion Model
Models
A type of generative model that creates images (or video, audio) by starting with pure noise and gradually removing it until a coherent output appears. The model learns to reverse the process of adding noise to real data. Stable Diffusion, DALL-E 3, and Midjourney all use variants of this approach.
Why it matters: Diffusion models dethroned GANs as the dominant image generation technique around 2022. They produce more diverse, controllable outputs and are the backbone of almost every image and video AI tool today.
E
Embedding
Vector Embedding
Training
A way to represent text (or images, or audio) as a list of numbers (a vector) that captures its meaning. Similar concepts end up close together in this number space — "cat" and "kitten" are nearby, while "cat" and "economics" are far apart.
Why it matters: Embeddings are the foundation of semantic search and RAG. They're how AI understands that a search for "fix login bug" should match a document about "authentication error resolution" even though no words overlap.
Endpoint
Infrastructure
A specific URL where an AI API accepts requests. For example, Anthropic's message endpoint is where you send prompts to Claude. Different endpoints serve different functions: text generation, embeddings, image creation, model listing.
Why it matters: When integrating AI providers, endpoints are where rubber meets road. Each provider structures theirs differently, which is why platforms like Zubnet exist — to normalize the mess.
F
Fine-tuning
Training
Taking a pre-trained model and training it further on a smaller, specific dataset to specialize its behavior. Like taking a general practitioner and putting them through a surgical residency — same foundational knowledge, new expertise.
Why it matters: Fine-tuning is how generic models become useful for specific tasks. A fine-tuned model can learn your company's tone, your domain's terminology, or a specific output format without starting from scratch.
Foundation Model
Fundamentals
A large model trained on broad data that serves as a base for many different tasks. Claude, GPT, Gemini, and Llama are all foundation models. They're "foundational" because they can be adapted to almost anything — writing, coding, analysis, image understanding — without being specifically trained for each task.
Why it matters: Foundation models changed the economics of AI. Instead of training a separate model for every task, you train one massive model once and then fine-tune or prompt it for specific needs.
G
GAN
Generative Adversarial Network
Models
A model architecture where two neural networks compete: a generator creates fake data, and a discriminator tries to tell real from fake. Through this adversarial game, the generator gets better at creating realistic outputs. Dominated image generation from 2014 to ~2022.
Why it matters: GANs pioneered realistic AI image generation and are still used in some real-time applications. But diffusion models have largely replaced them for quality-critical work because GANs are harder to train and less diverse in their outputs.
GPU
Graphics Processing Unit
Infrastructure
Originally designed for rendering graphics, GPUs turned out to be perfect for AI because they can do thousands of math operations simultaneously. Training and running AI models is essentially massive matrix multiplication — exactly what GPUs are built for. NVIDIA dominates this market.
Why it matters: GPUs are the physical bottleneck of the entire AI industry. Why models cost what they cost, why some providers are faster than others, why there's a global chip shortage — it all comes back to GPU supply and VRAM.
Grounding
Using AI
Connecting a model's responses to factual, verifiable sources rather than letting it rely solely on its training data. Grounding techniques include RAG, web search integration, and citation requirements. A grounded response says "according to [source]" rather than just asserting facts.
Why it matters: Grounding is the primary defense against hallucination. An ungrounded model confidently invents facts. A grounded one points you to real sources you can verify.
Guardrails
Safety
Safety mechanisms that prevent AI models from generating harmful, inappropriate, or off-topic content. Guardrails can be built into the model during training (RLHF), applied through system prompts, or enforced by external filters that check outputs before they reach users.
Why it matters: Without guardrails, models will happily help with dangerous requests. The challenge is calibration — too strict and the model becomes useless ("I can't help with that"), too loose and it becomes unsafe.
H
Hallucination
Using AI
When an AI model generates information that sounds confident and plausible but is factually wrong or entirely fabricated. The model isn't "lying" — it's pattern-matching its way to fluent text without a concept of truth. Fake citations, invented statistics, and non-existent API methods are common examples.
Why it matters: Hallucination is the single biggest trust problem in AI today. It's why you should always verify critical facts from AI outputs, and why techniques like RAG and grounding exist.
I
Inference
Infrastructure
The process of running a trained model to generate outputs. Training is learning; inference is using what was learned. Every time you send a prompt to Claude or generate an image with Stable Diffusion, that's inference. It's what costs providers GPU hours and what you pay for per token.
Why it matters: Inference cost and speed determine the economics of AI products. Faster inference = lower latency = better UX. Cheaper inference = lower prices = wider adoption. The entire quantization and optimization industry exists to make inference more efficient.
L
Latency
Time to First Token (TTFT)
Infrastructure
The delay between sending a request and getting the first response. In AI, this is often measured as Time to First Token (TTFT) — how long before the model starts streaming its answer. Affected by model size, server load, network distance, and prompt length.
Why it matters: Users perceive anything over ~2 seconds as slow. Low latency is why smaller models often win for real-time applications even when larger models are "smarter." It's a key differentiator between providers.
Large Language Model
LLM
Fundamentals
A neural network trained on massive amounts of text to understand and generate human language. "Large" refers to the number of parameters (billions) and the size of the training data (trillions of tokens). Claude, GPT, Gemini, Llama, and Mistral are all LLMs.
Why it matters: LLMs are the technology behind every AI chat, code assistant, and text generator you use. Understanding what they are (statistical pattern matchers, not sentient beings) helps you use them effectively and recognize their limits.
LoRA
Low-Rank Adaptation
Training
A technique that makes fine-tuning dramatically cheaper by only training a small number of additional parameters instead of modifying the entire model. LoRA "adapters" are lightweight add-ons (often just megabytes) that modify a model's behavior without retraining its billions of parameters.
Why it matters: LoRA democratized fine-tuning. Before it, customizing a 7B model required serious GPU resources. Now you can fine-tune on a single consumer GPU in hours and share the tiny adapter file. It's why there are thousands of specialized models on HuggingFace.
M
MCP
Model Context Protocol
Tools
An open protocol (created by Anthropic) that standardizes how AI models connect to external tools and data sources. Think of it as USB-C for AI — one standard interface instead of custom integrations for every tool. MCP servers expose capabilities; MCP clients (like Claude) consume them.
Why it matters: Before MCP, every AI-tool integration was bespoke. MCP means a tool built once works with any compatible AI. It's already supported by Claude, Cursor, and others. This is how AI goes from chatbot to actual assistant.
Mixture of Experts
MoE
Models
An architecture where the model contains multiple "expert" sub-networks, but only activates a few of them for each input. A router network decides which experts are relevant for a given token. This means a model can have 100B+ total parameters but only use 20B for any single forward pass.
Why it matters: MoE is how models like Mixtral and (reportedly) GPT-4 get the quality of a huge model with the speed of a smaller one. The trade-off is higher memory usage (all experts must be loaded) even though computation is cheaper.
Multimodal
Fundamentals
A model that can understand and/or generate multiple types of data: text, images, audio, video, code. Claude can read images and text; some models can also produce images or speech. "Multimodal" contrasts with "unimodal" models that only handle one type.
Why it matters: Real-world tasks are multimodal. You want to show an AI a screenshot and ask "what's wrong here?" or give it a diagram and say "implement this." Multimodal models make that possible.
N
Neural Network
Fundamentals
A computing system loosely inspired by biological brains, made of layers of interconnected "neurons" (mathematical functions) that learn patterns from data. Information flows through layers, getting progressively transformed until the network produces an output. Every modern AI model is a neural network of some kind.
Why it matters: Neural networks are the "how" behind all of AI. Understanding that they're math (not magic, not brains) helps demystify what AI can and can't do. They're pattern matchers — extraordinarily powerful ones, but pattern matchers nonetheless.
O
Open Weights
Open Source (in AI context)
Safety
When a company releases a model's trained parameters for anyone to download and run. "Open weights" is more accurate than "open source" because most released models don't include training data or training code — you get the finished model but not the recipe. Llama, Mistral, and Qwen are open-weights models.
Why it matters: Open weights mean you can run AI on your own hardware with full privacy — no API calls, no data leaving your network. The trade-off is you need the GPU resources to run them and you're responsible for safety.
Overfitting
Training
When a model memorizes its training data too well and loses the ability to generalize to new inputs. Like a student who memorizes answers to practice tests but can't solve new problems. The model performs great on training data but poorly on anything it hasn't seen before.
Why it matters: Overfitting is the most common failure mode in model training. It's why evaluation uses separate test sets, and why training for too long (too many epochs) can actually make a model worse.
P
Pre-training
Training
The initial, massive training phase where a model learns language (or other modalities) from a huge corpus. This is the expensive part — thousands of GPUs running for weeks or months, costing millions of dollars. The result is a foundation model that understands language but hasn't been specialized for any task yet.
Why it matters: Pre-training is what makes foundation models possible. It's also why only a handful of companies can create frontier models — the compute costs are astronomical. Everything else (fine-tuning, RLHF, prompting) builds on this base.
Prompt Engineering
Using AI
The practice of crafting inputs to get better outputs from AI models. This ranges from simple techniques (being specific, providing examples) to advanced methods (chain of thought, few-shot prompting, role assignment). Despite the fancy name, it's fundamentally about communicating clearly with a statistical system.
Why it matters: The same model can give wildly different results depending on how you ask. Good prompt engineering is the cheapest way to improve AI output quality — no training, no fine-tuning, just better communication.
Q
Quantization
GGUF, GPTQ, AWQ
Infrastructure
Reducing a model's precision to make it smaller and faster. A model trained in 32-bit floating point can be quantized to 8-bit, 4-bit, or even lower — shrinking its size by 4-8x with surprisingly small quality loss. GGUF is the popular format for local inference via llama.cpp.
Why it matters: Quantization is what makes it possible to run a 14B parameter model on a single GPU or even a laptop. Without it, open-weights models would be unusable for most people. The Q4_K_M and Q5_K_M variants hit the sweet spot of size vs. quality.
R
RAG
Retrieval-Augmented Generation
Tools
A technique that gives AI models access to external knowledge by retrieving relevant documents before generating a response. Instead of relying only on what the model learned during training, RAG searches a knowledge base, finds relevant chunks, and includes them in the prompt as context.
Why it matters: RAG solves two major problems: hallucination (the model has real sources to reference) and knowledge cutoff (the knowledge base can be updated without retraining). It's how most enterprise AI actually works.
Rate Limiting
Infrastructure
Restrictions on how many API requests you can make per minute/hour/day. Providers impose rate limits to prevent server overload and ensure fair access. Limits typically apply per API key and can restrict requests per minute (RPM) and tokens per minute (TPM).
Why it matters: Rate limits are the invisible ceiling you hit when scaling AI applications. They're why batch processing matters, why you need retry logic, and why some providers charge more for higher rate limits.
Red Teaming
Safety
The practice of deliberately trying to make an AI model fail, misbehave, or produce harmful outputs. Red teams probe for vulnerabilities: jailbreaks, bias, misinformation generation, privacy leaks. Named after military wargaming where a "red team" plays the adversary.
Why it matters: You can't fix what you don't know about. Red teaming is how providers discover that their model will explain how to pick locks if you ask it to "write a story about a locksmith." It's essential safety work that happens before every major model release.
RLHF
Reinforcement Learning from Human Feedback
Training
A training technique where human evaluators rank model outputs by quality, and this feedback is used to train a reward model that guides the AI toward better responses. It's what turns a raw pre-trained model (which just predicts next words) into a helpful, harmless assistant.
Why it matters: RLHF is the secret ingredient that made ChatGPT feel different from GPT-3. The base model already "knew" everything, but RLHF taught it to present that knowledge in a way humans actually find useful. It's also how safety behaviors are reinforced.
S
State Space Model
SSM, Mamba
Models
An alternative to Transformers that processes sequences by maintaining a compressed "state" instead of using attention over all tokens. Mamba is the most well-known SSM architecture. SSMs scale linearly with sequence length (vs. quadratic for attention), making them potentially much more efficient for very long contexts.
Why it matters: SSMs are the main challenger to Transformer dominance. They're faster for long sequences and use less memory, but the research is still maturing. Hybrid architectures (mixing SSM layers with attention) may end up being the best of both worlds.
System Prompt
System Message
Using AI
A special instruction given to a model at the start of a conversation that sets its behavior, personality, and rules. Unlike user messages, the system prompt is meant to be persistent and authoritative — it defines who the model is for this session. "You are a helpful coding assistant. Always use TypeScript."
Why it matters: System prompts are the primary tool for customizing AI behavior without fine-tuning. They're how companies make Claude act as a customer support agent, a code reviewer, or a medical information assistant — same model, different system prompt.
T
Temperature
Using AI
A parameter that controls how random or deterministic a model's output is. Temperature 0 makes the model always pick the most probable next token (deterministic, focused). Temperature 1+ makes it more willing to pick less probable tokens (creative, unpredictable). Most APIs default to around 0.7.
Why it matters: Temperature is the creativity dial. Writing fiction? Turn it up. Generating code or factual answers? Turn it down. It's one of the most impactful parameters you can adjust, and it costs nothing to experiment with.
Token
Fundamentals
The basic unit of text that AI models process. A token is typically a word or word fragment — "understanding" might be one token, while "un" + "der" + "standing" could be three. On average, one token is roughly 3/4 of a word in English. Models read, think, and charge in tokens.
Why it matters: Tokens are the currency of AI. Context windows are measured in tokens. API pricing is per token. When a provider says "1M context" they mean 1 million tokens, roughly 750K words. Understanding tokens helps you estimate costs and optimize usage.
Tool Use
Function Calling
Tools
The ability of an AI model to call external functions or tools during a conversation. Instead of just generating text, the model can decide to search the web, run code, query a database, or call an API — then incorporate the results into its response. The model outputs a structured "tool call" that the host application executes.
Why it matters: Tool use is what makes AI models actually useful beyond conversation. It's the mechanism behind code interpreters, web-browsing AI, and every AI agent. Without it, models are limited to what's in their training data.
Transformer
Models
The neural network architecture behind virtually all modern LLMs and many image/audio models. Introduced by Google in the 2017 paper "Attention Is All You Need," Transformers use self-attention to process all parts of an input simultaneously rather than sequentially, enabling massive parallelism during training.
Why it matters: Transformers are the architecture that made the current AI boom possible. GPT, Claude, Gemini, Llama, Mistral — they're all Transformers under the hood. Understanding this architecture helps you understand why models have the capabilities and limitations they do.
V
Vector Database
Qdrant, Pinecone, Weaviate, ChromaDB
Tools
A database optimized for storing and searching embeddings (vectors). Instead of matching exact keywords like a traditional database, vector databases find the most semantically similar items. You ask "how to fix a memory leak" and it returns documents about "debugging RAM consumption" because the embeddings are close.
Why it matters: Vector databases are the storage layer that makes RAG work. Without them, you'd need to embed your entire knowledge base on every query. They're also the backbone of recommendation systems and semantic search.
VRAM
Video RAM, GPU Memory
Infrastructure
The memory on a GPU, separate from system RAM. AI models must fit in VRAM to run on a GPU. A 7B parameter model in 16-bit precision needs ~14GB of VRAM. Consumer GPUs have 8-24GB; datacenter GPUs (A100, H100) have 40-80GB. VRAM is almost always the bottleneck for local AI.
Why it matters: VRAM determines which models you can run. It's why quantization exists (to shrink models to fit), why MoE models are tricky (all experts must fit in VRAM), and why GPU prices scale so steeply with memory. "Will it fit in VRAM?" is the first question of self-hosting AI.
Z
Zero-shot / Few-shot
In-context Learning
Using AI
Zero-shot means asking a model to do a task with no examples — just the instruction. Few-shot means providing a handful of input-output examples in the prompt before the actual request. "Here are 3 examples of how to format this data... now do this one." The model learns the pattern from context alone, no training required.
Why it matters: Few-shot prompting is the fastest way to teach a model a new format or behavior. Need consistent JSON output? Show it three examples. Need a specific writing style? Give it samples. It's free, instant, and surprisingly powerful.
ESC