Instruction Following: Definition & Meaning — AI Wiki

模型準確執行使用者請求的能力 — 尊重格式約束、長度要求、風格規範和行為指令。「用法語寫恰好 3 個項目列表,關於 X」測試 instruction following:回應必須是 bullet(不是段落)、恰好 3 個(不是 2 個或 5 個)、法語(不是英語)、關於 X(不是 Y)。

為什麼重要

Instruction following 是 LLM 最具實用意義的能力。使用者不太關心模型「知道」多少事實,更關心它是不是做了他們真正要求的事。一個寫出漂亮散文但忽視你格式要求的模型,不如一個可靠遵循指令的模型有用。這就是 IFEval 和其他 instruction following 基準成為模型評估核心的原因。

Deep Dive

Instruction following is trained through instruction tuning (SFT on instruction-response pairs) and refined through RLHF/DPO (learning to prefer responses that accurately follow instructions). The quality of instruction-following depends heavily on the diversity and precision of the training data: models that see many examples of "exactly 3 items" learn to count; models that only see vague instructions don't.

Where Models Fail

Common instruction-following failures: ignoring length constraints ("be brief" → still writes paragraphs), format drift (starting with the requested format but reverting to prose), constraint amnesia (following the first constraint but forgetting later ones in a complex instruction), and over-following (interpreting ambiguous instructions too literally or too broadly). These failures are more common in smaller models and become rarer with scale, but even frontier models occasionally miss constraints.

System Prompts and Hierarchy

Instruction following becomes complex when instructions conflict: the system prompt says "always respond in JSON" but the user says "write me a poem." Most models implement an instruction hierarchy where system-level instructions take precedence over user messages, but the boundaries are fuzzy. Well-designed applications structure their instruction hierarchy clearly and test edge cases where different levels of instructions might conflict.

Instruction Following

為什麼重要

Deep Dive

Where Models Fail

System Prompts and Hierarchy

相關概念