Zubnet AIAprenderWiki › Image Segmentation
Using AI

Image Segmentation

Semantic Segmentation, SAM, Instance Segmentation
Clasificar cada píxel en una imagen en una categoría. La segmentación semántica etiqueta los píxeles por clase (calle, acera, edificio, cielo). La segmentación de instancia distingue objetos individuales (persona 1, persona 2). La segmentación panóptica hace ambas. El SAM (Segment Anything Model) de Meta puede segmentar cualquier objeto a partir de un clic o un prompt textual, sin entrenamiento específico de la tarea.

Por qué importa

La segmentación provee el entendimiento más preciso del contenido de imagen. Los coches autónomos necesitan fronteras de carretera a nivel de píxel, no solo bounding boxes. La imagenología médica necesita fronteras exactas de tumores. La edición de fotos necesita máscaras precisas de objetos para remover fondos. La capacidad de SAM para segmentar cualquier objeto con cero entrenamiento hizo esta capacidad antes especializada accesible para todos.

Deep Dive

Traditional segmentation models (U-Net for medical images, DeepLab for general scenes) are trained on specific categories and produce fixed-class outputs. They work well within their training domain but can't segment novel objects. SAM (Kirillov et al., 2023, Meta) changed this by training on 1 billion masks across 11 million images, learning a general notion of "objectness" that transfers to any domain without fine-tuning.

SAM and Its Impact

SAM takes a prompt (a point click, a bounding box, or text) and produces a segmentation mask for the indicated object. It works on images it has never seen, for object types it was never specifically trained on — microscopy images, satellite photos, artwork. SAM 2 extended this to video, maintaining consistent object segmentation across frames. The impact: tasks that previously required domain-specific training and expensive annotation now work out of the box.

Applications

Medical imaging: segmenting tumors, organs, and cells for diagnosis and treatment planning. Autonomous driving: understanding the drivable surface, lane markings, and obstacles at pixel level. Photo/video editing: precise background removal, object selection, and compositing. Agriculture: analyzing crop health from aerial imagery. Robotics: understanding object boundaries for grasping and manipulation.

Conceptos relacionados

← Todos los términos
ESC