Zubnet AIAprenderWiki › Image Segmentation
Using AI

Image Segmentation

Semantic Segmentation, SAM, Instance Segmentation
Classificar cada pixel numa imagem em uma categoria. Segmentação semântica etiqueta pixels por classe (estrada, calçada, prédio, céu). Segmentação de instância distingue objetos individuais (pessoa 1, pessoa 2). Segmentação panóptica faz ambos. O SAM (Segment Anything Model) da Meta pode segmentar qualquer objeto a partir de um clique ou prompt textual, sem treinamento específico da tarefa.

Por que importa

Segmentação provê o entendimento mais preciso do conteúdo de imagem. Carros autônomos precisam de fronteiras de estrada em nível de pixel, não só bounding boxes. Imagem médica precisa de fronteiras exatas de tumores. Edição de fotos precisa de máscaras precisas de objetos para remoção de fundo. A capacidade do SAM de segmentar qualquer objeto com zero treinamento tornou essa capacidade antes especializada acessível a todos.

Deep Dive

Traditional segmentation models (U-Net for medical images, DeepLab for general scenes) are trained on specific categories and produce fixed-class outputs. They work well within their training domain but can't segment novel objects. SAM (Kirillov et al., 2023, Meta) changed this by training on 1 billion masks across 11 million images, learning a general notion of "objectness" that transfers to any domain without fine-tuning.

SAM and Its Impact

SAM takes a prompt (a point click, a bounding box, or text) and produces a segmentation mask for the indicated object. It works on images it has never seen, for object types it was never specifically trained on — microscopy images, satellite photos, artwork. SAM 2 extended this to video, maintaining consistent object segmentation across frames. The impact: tasks that previously required domain-specific training and expensive annotation now work out of the box.

Applications

Medical imaging: segmenting tumors, organs, and cells for diagnosis and treatment planning. Autonomous driving: understanding the drivable surface, lane markings, and obstacles at pixel level. Photo/video editing: precise background removal, object selection, and compositing. Agriculture: analyzing crop health from aerial imagery. Robotics: understanding object boundaries for grasping and manipulation.

Conceitos relacionados

← Todos os termos
ESC