Image-to-Image: Definition & Meaning — AI Wiki

基于一张已有图像加上一个文本 prompt 生成新图像。不是从纯噪声开始(text-to-image),扩散过程从输入图像的噪声版本开始,保留其结构,同时按 prompt 修改。“这张照片的赛博朋克版本”保留构图,但改变风格和细节。

为什么重要

Image-to-image 是摄影和 AI 艺术的桥梁。它让你用草图、照片或现有作品作起点,在 AI 改变风格、添加细节、或重新想象内容的同时保持布局和构图。它比 text-to-image 更可控,因为你用视觉结构引导输出,不只是词语。

Deep Dive

The mechanism: take the input image, encode it to latent space (via the VAE encoder), add noise proportional to a "denoising strength" parameter (0.0 = no change, 1.0 = pure noise = text-to-image), then denoise conditioned on the text prompt. At strength 0.3, the output closely resembles the input with subtle modifications. At strength 0.8, it's largely reimagined but keeps the basic composition.

Denoising Strength

The denoising strength is the key parameter: it controls how much the output can deviate from the input. Low strength (0.2–0.4): minor style changes, color adjustments, subtle detail additions. Medium strength (0.5–0.7): significant style transformation while preserving composition. High strength (0.8–1.0): major reimagining, only vague structural similarity to the input. Finding the right strength for your use case requires experimentation.

Sketch-to-Image

A powerful img2img workflow: draw a rough sketch (even in MS Paint), use it as the input image with medium-high denoising strength, and describe the desired output. The sketch provides spatial layout (where objects are, their relative sizes) while the AI fills in all the artistic detail. This makes AI image generation accessible to anyone who can draw a stick figure — the composition comes from you, the rendering from the AI.

Image-to-Image

为什么重要

Deep Dive

Denoising Strength

Sketch-to-Image

相关概念