Zubnet AILearnWiki › Guidance Scale
Using AI

Guidance Scale

CFG Scale, Classifier-Free Guidance
A parameter that controls how strongly an image generation model follows the text prompt. Low guidance (1–3): the model generates freely, producing diverse but potentially off-topic images. High guidance (7–15): the model strictly follows the prompt but may produce saturated, artifact-heavy images. The typical sweet spot is 7–9. It's the image generation equivalent of temperature for text models.

Why it matters

Guidance scale is the most impactful parameter in image generation after the prompt itself. Too low and the image ignores your description. Too high and it looks oversaturated and artificial. Understanding guidance scale helps you troubleshoot "why doesn't my image match my prompt?" (guidance too low) and "why does my image look weird?" (guidance too high).

Deep Dive

Classifier-free guidance (Ho & Salimans, 2022) works by computing two denoising predictions per step: one conditional (using your prompt) and one unconditional (ignoring the prompt). The final prediction amplifies the difference: output = unconditional + scale × (conditional − unconditional). Scale=1 means no guidance (just the conditional prediction). Scale=7 means the model amplifies the prompt's influence 7x beyond what it would naturally do.

Why Higher Isn't Always Better

Higher guidance makes the image more "prompt-aligned" but at a cost: the model overshoots, producing oversaturated colors, unrealistic lighting, and visual artifacts. Very high guidance (15+) often produces images that look like they've been run through a sharpening filter — technically matching the prompt but aesthetically poor. The sweet spot depends on the model: SD 1.5 works well at 7–9, SDXL at 5–8, and Flux at 3–5.

Dynamic and Negative CFG

Advanced techniques manipulate guidance during generation: starting with high guidance (to establish composition) and reducing it in later steps (to refine details naturally). Negative CFG (guidance scale below 1) inverts the prompt's effect, generating the opposite of what's described — useful for understanding what the model associates with specific concepts but rarely useful for actual image generation.

Related Concepts

← All Terms
ESC