Zubnet AIसीखेंWiki › ControlNet
Models

ControlNet

एक architecture जो image generation models में spatial control add करती है। Text में सिर्फ describe करने के बजाय कि आप क्या चाहते हैं (“एक person standing”), ControlNet आपको specify करने देता है कैसे — एक edge map, depth map, pose skeleton, या segmentation map provide करके जो composition guide करे। Generated image आपके control input की spatial structure follow करती है, text prompt से details fill करते हुए।

यह क्यों matter करता है

ControlNet ने AI image generation को professional workflows के लिए usable बनाया। इसके बिना, आपको random compositions मिलते हैं और आप उम्मीद करते हैं। इसके साथ, आप exact pose, layout, या structure specify करते हैं जो आपको चाहिए। ये “generate something vaguely like what I want” और “generate exactly this composition with these details” के बीच का difference है — design, advertising, और production work के लिए critical।

Deep Dive

ControlNet (Zhang et al., 2023) works by creating a trainable copy of the diffusion model's encoder and connecting it to the original model via zero-initialized convolution layers. The control signal (edge map, pose, depth) is processed by this copy, and the features are added to the main model's corresponding layers. The zero initialization means the control starts with no effect and gradually learns to guide generation during training, preserving the original model's quality.

Control Types

Common control inputs: Canny edges (outline structure), OpenPose (human body pose), depth maps (3D structure), segmentation maps (which region is what), normal maps (surface orientation), and scribbles (rough sketches). Each control type requires a separately trained ControlNet. Multiple controls can be combined: a pose skeleton plus an edge map gives you both body position and structural details.

IP-Adapter and Beyond

Beyond spatial control, techniques like IP-Adapter provide style control: give a reference image and generate new images in the same style. T2I-Adapter is a lighter alternative to ControlNet that achieves similar control with fewer parameters. The trend is toward increasingly precise, composable control — specifying exactly what you want through a combination of text, spatial guides, style references, and iterative refinement.

संबंधित अवधारणाएँ

← सभी Terms
← Contrastive सीखेंing Copyright in AI →