ControlNet: Definition & Meaning — AI Wiki

给图像生成模型添加空间控制的架构。不只是用文本描述你想要什么(“一个人站着”),ControlNet 让你指定如何 — 提供一张边缘图、深度图、姿势骨架、或分割图来引导构图。生成的图像遵循你控制输入的空间结构,同时从文本 prompt 填入细节。

为什么重要

ControlNet 让 AI 图像生成能用于专业工作流。没有它,你得到随机构图,祈祷结果好。有了它,你指定需要的精确姿势、布局或结构。这是“生成点模糊像我想要的东西”和“用这些细节生成恰好这个构图”的区别 — 对设计、广告、生产工作至关重要。

Deep Dive

ControlNet (Zhang et al., 2023) works by creating a trainable copy of the diffusion model's encoder and connecting it to the original model via zero-initialized convolution layers. The control signal (edge map, pose, depth) is processed by this copy, and the features are added to the main model's corresponding layers. The zero initialization means the control starts with no effect and gradually learns to guide generation during training, preserving the original model's quality.

Control Types

Common control inputs: Canny edges (outline structure), OpenPose (human body pose), depth maps (3D structure), segmentation maps (which region is what), normal maps (surface orientation), and scribbles (rough sketches). Each control type requires a separately trained ControlNet. Multiple controls can be combined: a pose skeleton plus an edge map gives you both body position and structural details.

IP-Adapter and Beyond

Beyond spatial control, techniques like IP-Adapter provide style control: give a reference image and generate new images in the same style. T2I-Adapter is a lighter alternative to ControlNet that achieves similar control with fewer parameters. The trend is toward increasingly precise, composable control — specifying exactly what you want through a combination of text, spatial guides, style references, and iterative refinement.

ControlNet

为什么重要

Deep Dive

Control Types

IP-Adapter and Beyond

相关概念