Image-to-Image: Definition & Meaning — AI Wiki

基於一張已有影像加上一個文字 prompt 生成新影像。不是從純雜訊開始(text-to-image),擴散過程從輸入影像的雜訊版本開始,保留其結構,同時按 prompt 修改。「這張照片的賽博朋克版本」保留構圖,但改變風格和細節。

為什麼重要

Image-to-image 是攝影和 AI 藝術的橋樑。它讓你用草圖、照片或現有作品作起點,在 AI 改變風格、添加細節、或重新想像內容的同時保持版面和構圖。它比 text-to-image 更可控,因為你用視覺結構引導輸出,不只是詞語。

Deep Dive

The mechanism: take the input image, encode it to latent space (via the VAE encoder), add noise proportional to a "denoising strength" parameter (0.0 = no change, 1.0 = pure noise = text-to-image), then denoise conditioned on the text prompt. At strength 0.3, the output closely resembles the input with subtle modifications. At strength 0.8, it's largely reimagined but keeps the basic composition.

Denoising Strength

The denoising strength is the key parameter: it controls how much the output can deviate from the input. Low strength (0.2–0.4): minor style changes, color adjustments, subtle detail additions. Medium strength (0.5–0.7): significant style transformation while preserving composition. High strength (0.8–1.0): major reimagining, only vague structural similarity to the input. Finding the right strength for your use case requires experimentation.

Sketch-to-Image

A powerful img2img workflow: draw a rough sketch (even in MS Paint), use it as the input image with medium-high denoising strength, and describe the desired output. The sketch provides spatial layout (where objects are, their relative sizes) while the AI fills in all the artistic detail. This makes AI image generation accessible to anyone who can draw a stick figure — the composition comes from you, the rendering from the AI.

Image-to-Image

為什麼重要

Deep Dive

Denoising Strength

Sketch-to-Image

相關概念