Zubnet AI学习Wiki › Autoregressive
基础

Autoregressive

Autoregressive Model, Next-Token Prediction
一次生成一个 token 的模型,每个新 token 基于之前所有 token 预测。每个现代 LLM — Claude、GPT、Llama、Gemini — 都是自回归的。模型不“规划”完整响应再写;它字面上预测下一个词、附加它、然后预测下一个,一遍又一遍,直到它决定停。

为什么重要

理解自回归生成解释大多数 LLM 行为:为什么响应按 token 流式输出、为什么模型有时在段落中间自相矛盾、为什么更长输出更慢更贵、以及为什么你不能轻易让模型“回去修开头”。模型一直在前进,一次一个 token。

Deep Dive

Autoregressive generation sounds simple — predict the next token, repeat — but the implications run deep. The model produces a probability distribution over its entire vocabulary at each step. The token that gets selected depends on sampling parameters like temperature and top-p. At temperature 0, the model always picks the highest-probability token (greedy decoding). At higher temperatures, lower-probability tokens have a real chance of being selected, which is where creativity and variety come from.

Why It's Slow

During input processing, the model can process all your prompt tokens in parallel — this is called the "prefill" phase. But during generation, each new token requires a full forward pass through the entire model, and that pass can't start until the previous token is decided. This sequential bottleneck is why output generation is much slower than input processing, and why output tokens cost more. A 1000-token response requires 1000 serial forward passes, regardless of how many GPUs you have.

The Consequences of Forward-Only

Because the model can only move forward, it can't revise earlier tokens based on later insights. If it starts a sentence with "There are three reasons:" and then realizes there are actually four, it can't go back — it has to either awkwardly squeeze in a fourth or pretend there were only three. This is why chain-of-thought prompting helps: by asking the model to think before answering, you give it a chance to work through the problem before committing to a final answer. The "thinking" tokens become scaffolding that shapes the answer tokens that follow.

Alternatives Exist

Not all generative models are autoregressive. Diffusion models (used for images) generate everything at once and iteratively refine. Some research explores non-autoregressive text generation, where the model predicts all tokens simultaneously and then iterates. But for text, autoregressive remains dominant because language has a strong left-to-right (or right-to-left) sequential structure that autoregressive models exploit naturally. The question isn't whether autoregressive will be replaced, but whether hybrid approaches can get the best of both worlds.

相关概念

← 所有术语
← Autonomous Agent Backpropagation →