Zubnet AIAprenderWiki › Large Language Model
Fundamentos

Large Language Model

También conocido como: LLM
Una red neuronal entrenada sobre cantidades masivas de texto para entender y generar lenguaje humano. «Large» se refiere al número de parámetros (miles de millones) y al tamaño de los datos de entrenamiento (billones de tokens). Claude, GPT, Gemini, Llama y Mistral son todos LLM.

Por qué importa

Los LLM son la tecnología detrás de cada chat de IA, asistente de código y generador de texto que usas. Entender qué son (pattern-matchers estadísticos, no seres conscientes) te ayuda a usarlos con efectividad y reconocer sus límites.

Deep Dive

At its core, an LLM is a function that takes a sequence of tokens and outputs a probability distribution over the next token. That is the entire trick. During training, the model sees trillions of tokens of text and adjusts its billions of parameters to get better at predicting what comes next. When you chat with Claude or GPT, the model generates one token at a time, each time feeding its own previous output back in as input. This autoregressive process is why you see responses streaming in word by word — the model genuinely does not know what it will say next until it gets there.

The Transformer Backbone

Most modern LLMs are built on the Transformer architecture, introduced by Google researchers in 2017. The Transformer's key innovation is the attention mechanism, which lets the model look at every other token in the input when deciding what a given token means. This solves a problem that plagued earlier architectures (RNNs, LSTMs): they struggled with long-range dependencies because information had to flow sequentially through every intermediate step. Attention lets a model directly connect "it" in paragraph five to "the database server" in paragraph one, regardless of how much text sits between them. Some newer architectures like Mamba use state-space models instead of attention, trading some flexibility for much better efficiency on long sequences, but Transformers remain the dominant paradigm for the largest models.

Why Scale Matters

The "Large" in LLM is doing real work. Scale turns out to matter in ways researchers did not fully expect. A 1-billion-parameter model can handle basic grammar and simple facts. A 70-billion-parameter model can write working code and reason through multi-step problems. The largest models (hundreds of billions of parameters, trained on trillions of tokens) exhibit emergent capabilities — skills that appear suddenly at scale rather than improving gradually. Chain-of-thought reasoning, multilingual transfer, and in-context learning are all capabilities that only reliably show up once models cross certain size thresholds. This scaling behavior is described by "scaling laws" that relate model size, dataset size, and compute budget to performance in surprisingly predictable ways.

From Predictor to Assistant

After pre-training, raw LLMs are not particularly useful to talk to — they just want to complete text, so they might continue your question with more questions instead of answering. This is where alignment comes in. Techniques like RLHF (reinforcement learning from human feedback) and constitutional AI train the model to be helpful, harmless, and honest rather than just a text predictor. This is the difference between a base model (like raw Llama) and a chat model (like Claude or ChatGPT). The base model has the knowledge; alignment teaches it how to use that knowledge in a conversation.

The Reliability Gap

A practical gotcha that catches many developers: LLMs do not "know" things the way a database does. They have encoded statistical patterns from training data, which means they can confidently state things that are subtly or completely wrong — hallucination. They also have a knowledge cutoff date and cannot access real-time information unless given tools. The best practitioners treat LLMs as very capable but unreliable collaborators: great for drafting, brainstorming, and code generation, but requiring verification for factual claims. Retrieval-augmented generation (RAG), structured output parsing, and tool use are the engineering patterns that make LLM-powered applications reliable in production.

Conceptos relacionados

← Todos los términos
← LangChain Latency →
ESC