Knowledge Cutoff: Definition & Meaning — AI Wiki

模型沒有訓練資料的日期之後,意味著它缺乏對那個日期之後發生的事件、發現、變化的知識。如果一個模型的 cutoff 是 2024 年 4 月,它就不知道 2024 年 5 月或之後發生的任何事情 — 新產品、新聞事件、科學論文、更新的事實。

為什麼重要

知識截止日期是 AI 助手最常見的挫折來源。「為什麼它不知道 X?」因為 X 發生在訓練之後。這個侷限推動 RAG(給模型存取當前資訊)和 tool use(讓模型搜尋網路)的採用。理解 cutoff 幫你知道何時信任模型、何時驗證。

Deep Dive

The cutoff exists because training data must be collected, cleaned, and processed before training begins — a process that takes weeks to months. A model released in 2025 might have a training data cutoff of late 2024. The gap between cutoff and release represents processing time. Some providers do additional "knowledge updates" through fine-tuning on more recent data, but these are typically narrow (news events, product launches) rather than comprehensive.

Not a Hard Wall

The cutoff isn't perfectly clean. Training data often includes content published over a range of dates, and web scrapes may include pages last updated at various times. A model might know some things from after its "official" cutoff because of overlapping data collection. It might also have gaps in knowledge from before the cutoff if certain sources weren't included. The cutoff date is a rough guide, not a precise boundary.

Working Around It

Three approaches address the cutoff limitation: RAG (retrieve current documents and include them in the prompt), web search tools (let the model search for current information), and regular model updates (retraining or fine-tuning on recent data). In practice, most production applications use RAG or tool use rather than relying solely on the model's internal knowledge, even for information within the training period, because the model's parametric knowledge can be imprecise even for things it "knows."

Knowledge Cutoff

為什麼重要

Deep Dive

Not a Hard Wall

Working Around It

相關概念