Zubnet AI學習Wiki › Autoregressive
基礎

Autoregressive

Autoregressive Model, Next-Token Prediction
一次生成一個 token 的模型,每個新 token 基於之前所有 token 預測。每個現代 LLM — Claude、GPT、Llama、Gemini — 都是自迴歸的。模型不「規劃」完整回應再寫;它字面上預測下一個詞、附加它、然後預測下一個,一遍又一遍,直到它決定停。

為什麼重要

理解自迴歸生成解釋大多數 LLM 行為:為什麼回應按 token 串流輸出、為什麼模型有時在段落中間自相矛盾、為什麼更長輸出更慢更貴、以及為什麼你不能輕易讓模型「回去修開頭」。模型一直在前進,一次一個 token。

Deep Dive

Autoregressive generation sounds simple — predict the next token, repeat — but the implications run deep. The model produces a probability distribution over its entire vocabulary at each step. The token that gets selected depends on sampling parameters like temperature and top-p. At temperature 0, the model always picks the highest-probability token (greedy decoding). At higher temperatures, lower-probability tokens have a real chance of being selected, which is where creativity and variety come from.

Why It's Slow

During input processing, the model can process all your prompt tokens in parallel — this is called the "prefill" phase. But during generation, each new token requires a full forward pass through the entire model, and that pass can't start until the previous token is decided. This sequential bottleneck is why output generation is much slower than input processing, and why output tokens cost more. A 1000-token response requires 1000 serial forward passes, regardless of how many GPUs you have.

The Consequences of Forward-Only

Because the model can only move forward, it can't revise earlier tokens based on later insights. If it starts a sentence with "There are three reasons:" and then realizes there are actually four, it can't go back — it has to either awkwardly squeeze in a fourth or pretend there were only three. This is why chain-of-thought prompting helps: by asking the model to think before answering, you give it a chance to work through the problem before committing to a final answer. The "thinking" tokens become scaffolding that shapes the answer tokens that follow.

Alternatives Exist

Not all generative models are autoregressive. Diffusion models (used for images) generate everything at once and iteratively refine. Some research explores non-autoregressive text generation, where the model predicts all tokens simultaneously and then iterates. But for text, autoregressive remains dominant because language has a strong left-to-right (or right-to-left) sequential structure that autoregressive models exploit naturally. The question isn't whether autoregressive will be replaced, but whether hybrid approaches can get the best of both worlds.

相關概念

← 所有術語
← Autonomous Agent Backpropagation →