ControlNet: Definition & Meaning — AI Wiki

給影像生成模型添加空間控制的架構。不只是用文字描述你想要什麼(「一個人站著」),ControlNet 讓你指定如何 — 提供一張邊緣圖、深度圖、姿勢骨架、或分割圖來引導構圖。生成的影像遵循你控制輸入的空間結構,同時從文字 prompt 填入細節。

為什麼重要

ControlNet 讓 AI 影像生成能用於專業工作流。沒有它,你得到隨機構圖,祈禱結果好。有了它,你指定需要的精確姿勢、版面或結構。這是「生成點模糊像我想要的東西」和「用這些細節生成恰好這個構圖」的區別 — 對設計、廣告、製作工作至關重要。

Deep Dive

ControlNet (Zhang et al., 2023) works by creating a trainable copy of the diffusion model's encoder and connecting it to the original model via zero-initialized convolution layers. The control signal (edge map, pose, depth) is processed by this copy, and the features are added to the main model's corresponding layers. The zero initialization means the control starts with no effect and gradually learns to guide generation during training, preserving the original model's quality.

Control Types

Common control inputs: Canny edges (outline structure), OpenPose (human body pose), depth maps (3D structure), segmentation maps (which region is what), normal maps (surface orientation), and scribbles (rough sketches). Each control type requires a separately trained ControlNet. Multiple controls can be combined: a pose skeleton plus an edge map gives you both body position and structural details.

IP-Adapter and Beyond

Beyond spatial control, techniques like IP-Adapter provide style control: give a reference image and generate new images in the same style. T2I-Adapter is a lighter alternative to ControlNet that achieves similar control with fewer parameters. The trend is toward increasingly precise, composable control — specifying exactly what you want through a combination of text, spatial guides, style references, and iterative refinement.

ControlNet

為什麼重要

Deep Dive

Control Types

IP-Adapter and Beyond

相關概念