The process: (1) provide an original image, (2) create a mask indicating which region to regenerate, (3) optionally provide a text prompt describing what should appear in the masked region, (4) the model denoises only the masked area while keeping the unmasked area fixed, using the surrounding context to ensure coherence. The model sees the entire image (both masked and unmasked regions) during generation, ensuring the new content matches lighting, perspective, and style.
Outpainting extends the image canvas: imagine taking a portrait photo and extending it to show the full room. The model generates new content at the borders that's consistent with the existing image. This is useful for: changing aspect ratios (turning a square image into a landscape), adding context to cropped images, and creating panoramic views from single photos. The quality depends on how much context the original image provides.
For clean inpainting results: mask slightly larger than the area you want to change (the model handles transitions better with some overlap), provide a descriptive prompt for the replacement content, use appropriate denoising strength (0.7–0.9 for replacing content, 0.3–0.5 for subtle modifications), and ensure the mask edges are feathered rather than sharp for seamless blending.