Zubnet AILearnWiki › Image Segmentation
Using AI

Image Segmentation

Semantic Segmentation, SAM, Instance Segmentation
Classifying every pixel in an image into a category. Semantic segmentation labels pixels by class (road, sidewalk, building, sky). Instance segmentation distinguishes individual objects (person 1, person 2). Panoptic segmentation does both. Meta's SAM (Segment Anything Model) can segment any object from a point click or text prompt, without task-specific training.

Why it matters

Segmentation provides the most precise understanding of image content. Self-driving cars need pixel-level road boundaries, not just bounding boxes. Medical imaging needs exact tumor boundaries. Photo editing needs precise object masks for background removal. SAM's ability to segment any object with zero training made this previously specialized capability accessible to everyone.

Deep Dive

Traditional segmentation models (U-Net for medical images, DeepLab for general scenes) are trained on specific categories and produce fixed-class outputs. They work well within their training domain but can't segment novel objects. SAM (Kirillov et al., 2023, Meta) changed this by training on 1 billion masks across 11 million images, learning a general notion of "objectness" that transfers to any domain without fine-tuning.

SAM and Its Impact

SAM takes a prompt (a point click, a bounding box, or text) and produces a segmentation mask for the indicated object. It works on images it has never seen, for object types it was never specifically trained on — microscopy images, satellite photos, artwork. SAM 2 extended this to video, maintaining consistent object segmentation across frames. The impact: tasks that previously required domain-specific training and expensive annotation now work out of the box.

Applications

Medical imaging: segmenting tumors, organs, and cells for diagnosis and treatment planning. Autonomous driving: understanding the drivable surface, lane markings, and obstacles at pixel level. Photo/video editing: precise background removal, object selection, and compositing. Agriculture: analyzing crop health from aerial imagery. Robotics: understanding object boundaries for grasping and manipulation.

Related Concepts

← All Terms
ESC