Zubnet AILearnWiki › Large Language Model
Fundamentals

Large Language Model

Also known as: LLM
A neural network trained on massive amounts of text to understand and generate human language. "Large" refers to the number of parameters (billions) and the size of the training data (trillions of tokens). Claude, GPT, Gemini, Llama, and Mistral are all LLMs.

Why it matters

LLMs are the technology behind every AI chat, code assistant, and text generator you use. Understanding what they are (statistical pattern matchers, not sentient beings) helps you use them effectively and recognize their limits.

Deep Dive

At its core, an LLM is a function that takes a sequence of tokens and outputs a probability distribution over the next token. That is the entire trick. During training, the model sees trillions of tokens of text and adjusts its billions of parameters to get better at predicting what comes next. When you chat with Claude or GPT, the model generates one token at a time, each time feeding its own previous output back in as input. This autoregressive process is why you see responses streaming in word by word — the model genuinely does not know what it will say next until it gets there.

The Transformer Backbone

Most modern LLMs are built on the Transformer architecture, introduced by Google researchers in 2017. The Transformer's key innovation is the attention mechanism, which lets the model look at every other token in the input when deciding what a given token means. This solves a problem that plagued earlier architectures (RNNs, LSTMs): they struggled with long-range dependencies because information had to flow sequentially through every intermediate step. Attention lets a model directly connect "it" in paragraph five to "the database server" in paragraph one, regardless of how much text sits between them. Some newer architectures like Mamba use state-space models instead of attention, trading some flexibility for much better efficiency on long sequences, but Transformers remain the dominant paradigm for the largest models.

Why Scale Matters

The "Large" in LLM is doing real work. Scale turns out to matter in ways researchers did not fully expect. A 1-billion-parameter model can handle basic grammar and simple facts. A 70-billion-parameter model can write working code and reason through multi-step problems. The largest models (hundreds of billions of parameters, trained on trillions of tokens) exhibit emergent capabilities — skills that appear suddenly at scale rather than improving gradually. Chain-of-thought reasoning, multilingual transfer, and in-context learning are all capabilities that only reliably show up once models cross certain size thresholds. This scaling behavior is described by "scaling laws" that relate model size, dataset size, and compute budget to performance in surprisingly predictable ways.

From Predictor to Assistant

After pre-training, raw LLMs are not particularly useful to talk to — they just want to complete text, so they might continue your question with more questions instead of answering. This is where alignment comes in. Techniques like RLHF (reinforcement learning from human feedback) and constitutional AI train the model to be helpful, harmless, and honest rather than just a text predictor. This is the difference between a base model (like raw Llama) and a chat model (like Claude or ChatGPT). The base model has the knowledge; alignment teaches it how to use that knowledge in a conversation.

The Reliability Gap

A practical gotcha that catches many developers: LLMs do not "know" things the way a database does. They have encoded statistical patterns from training data, which means they can confidently state things that are subtly or completely wrong — hallucination. They also have a knowledge cutoff date and cannot access real-time information unless given tools. The best practitioners treat LLMs as very capable but unreliable collaborators: great for drafting, brainstorming, and code generation, but requiring verification for factual claims. Retrieval-augmented generation (RAG), structured output parsing, and tool use are the engineering patterns that make LLM-powered applications reliable in production.

Related Concepts

← All Terms
← Kling AI Latency →
ESC