Zubnet AI學習Wiki › Guidance Scale
Using AI

Guidance Scale

CFG Scale, Classifier-Free Guidance
控制影像生成模型多嚴格遵循文字 prompt 的參數。Guidance 低(1–3):模型自由生成,產出多樣但可能偏題的影像。Guidance 高(7–15):模型嚴格跟 prompt 但可能產出飽和、多偽影的影像。典型甜蜜點是 7–9。它是影像生成版的 temperature。

為什麼重要

Guidance scale 是影像生成裡僅次於 prompt 本身影響最大的參數。太低,影像忽視你的描述。太高,看起來過飽和、人工。理解 guidance scale 能幫你診斷「為什麼我的影像和 prompt 不符?」(guidance 太低)和「為什麼我的影像看起來怪?」(guidance 太高)。

Deep Dive

Classifier-free guidance (Ho & Salimans, 2022) works by computing two denoising predictions per step: one conditional (using your prompt) and one unconditional (ignoring the prompt). The final prediction amplifies the difference: output = unconditional + scale × (conditional − unconditional). Scale=1 means no guidance (just the conditional prediction). Scale=7 means the model amplifies the prompt's influence 7x beyond what it would naturally do.

Why Higher Isn't Always Better

Higher guidance makes the image more "prompt-aligned" but at a cost: the model overshoots, producing oversaturated colors, unrealistic lighting, and visual artifacts. Very high guidance (15+) often produces images that look like they've been run through a sharpening filter — technically matching the prompt but aesthetically poor. The sweet spot depends on the model: SD 1.5 works well at 7–9, SDXL at 5–8, and Flux at 3–5.

Dynamic and Negative CFG

進階 techniques manipulate guidance during generation: starting with high guidance (to establish composition) and reducing it in later steps (to refine details naturally). Negative CFG (guidance scale below 1) inverts the prompt's effect, generating the opposite of what's described — useful for understanding what the model associates with specific concepts but rarely useful for actual image generation.

相關概念

← 所有術語
ESC