Zubnet AI学习Wiki › Guidance Scale
Using AI

Guidance Scale

CFG Scale, Classifier-Free Guidance
控制图像生成模型多严格遵循文本 prompt 的参数。Guidance 低(1–3):模型自由生成,产出多样但可能偏题的图像。Guidance 高(7–15):模型严格跟 prompt 但可能产出饱和、多伪影的图像。典型甜蜜点是 7–9。它是图像生成版的 temperature。

为什么重要

Guidance scale 是图像生成里仅次于 prompt 本身影响最大的参数。太低,图像忽视你的描述。太高,看起来过饱和、人工。理解 guidance scale 能帮你诊断“为什么我的图像和 prompt 不符?”(guidance 太低)和“为什么我的图像看起来怪?”(guidance 太高)。

Deep Dive

Classifier-free guidance (Ho & Salimans, 2022) works by computing two denoising predictions per step: one conditional (using your prompt) and one unconditional (ignoring the prompt). The final prediction amplifies the difference: output = unconditional + scale × (conditional − unconditional). Scale=1 means no guidance (just the conditional prediction). Scale=7 means the model amplifies the prompt's influence 7x beyond what it would naturally do.

Why Higher Isn't Always Better

Higher guidance makes the image more "prompt-aligned" but at a cost: the model overshoots, producing oversaturated colors, unrealistic lighting, and visual artifacts. Very high guidance (15+) often produces images that look like they've been run through a sharpening filter — technically matching the prompt but aesthetically poor. The sweet spot depends on the model: SD 1.5 works well at 7–9, SDXL at 5–8, and Flux at 3–5.

Dynamic and Negative CFG

进阶 techniques manipulate guidance during generation: starting with high guidance (to establish composition) and reducing it in later steps (to refine details naturally). Negative CFG (guidance scale below 1) inverts the prompt's effect, generating the opposite of what's described — useful for understanding what the model associates with specific concepts but rarely useful for actual image generation.

相关概念

← 所有术语
ESC