Zubnet AILearnWiki › Image-to-Image
Using AI

Image-to-Image

img2img, Image Conditioning
Generating a new image based on an existing image plus a text prompt. Instead of starting from pure noise (text-to-image), the diffusion process starts from a noisy version of the input image, preserving its structure while modifying it according to the prompt. "A cyberpunk version of this photo" keeps the composition but transforms the style and details.

Why it matters

Image-to-image is the bridge between photography and AI art. It lets you use sketches, photos, or existing artwork as a starting point, maintaining layout and composition while the AI transforms style, adds detail, or reimagines the content. It's more controllable than text-to-image because you're guiding the output with visual structure, not just words.

Deep Dive

The mechanism: take the input image, encode it to latent space (via the VAE encoder), add noise proportional to a "denoising strength" parameter (0.0 = no change, 1.0 = pure noise = text-to-image), then denoise conditioned on the text prompt. At strength 0.3, the output closely resembles the input with subtle modifications. At strength 0.8, it's largely reimagined but keeps the basic composition.

Denoising Strength

The denoising strength is the key parameter: it controls how much the output can deviate from the input. Low strength (0.2–0.4): minor style changes, color adjustments, subtle detail additions. Medium strength (0.5–0.7): significant style transformation while preserving composition. High strength (0.8–1.0): major reimagining, only vague structural similarity to the input. Finding the right strength for your use case requires experimentation.

Sketch-to-Image

A powerful img2img workflow: draw a rough sketch (even in MS Paint), use it as the input image with medium-high denoising strength, and describe the desired output. The sketch provides spatial layout (where objects are, their relative sizes) while the AI fills in all the artistic detail. This makes AI image generation accessible to anyone who can draw a stick figure — the composition comes from you, the rendering from the AI.

Related Concepts

← All Terms
ESC