Image Segmentation: Definition & Meaning — AI Wiki

把影像中每個像素分類到一個類別。語意分割按類別給像素打標籤(馬路、人行道、建築、天空)。實例分割區分獨立物件(人 1、人 2)。全景分割兩者都做。Meta 的 SAM(Segment Anything Model)能透過點擊或文字 prompt 分割任何物件,不需要任務特定的訓練。

為什麼重要

分割提供對影像內容最精確的理解。自動駕駛汽車需要像素級的道路邊界,不只是 bounding box。醫學影像需要精確的腫瘤邊界。照片編輯需要精確的物件遮罩來去除背景。SAM 零訓練就能分割任何物件的能力,讓這個以前專業化的能力對所有人都可及。

Deep Dive

Traditional segmentation models (U-Net for medical images, DeepLab for general scenes) are trained on specific categories and produce fixed-class outputs. They work well within their training domain but can't segment novel objects. SAM (Kirillov et al., 2023, Meta) changed this by training on 1 billion masks across 11 million images, learning a general notion of "objectness" that transfers to any domain without fine-tuning.

SAM and Its Impact

SAM takes a prompt (a point click, a bounding box, or text) and produces a segmentation mask for the indicated object. It works on images it has never seen, for object types it was never specifically trained on — microscopy images, satellite photos, artwork. SAM 2 extended this to video, maintaining consistent object segmentation across frames. The impact: tasks that previously required domain-specific training and expensive annotation now work out of the box.

Applications

Medical imaging: segmenting tumors, organs, and cells for diagnosis and treatment planning. Autonomous driving: understanding the drivable surface, lane markings, and obstacles at pixel level. Photo/video editing: precise background removal, object selection, and compositing. Agriculture: analyzing crop health from aerial imagery. Robotics: understanding object boundaries for grasping and manipulation.

Image Segmentation

為什麼重要

Deep Dive

SAM and Its Impact

Applications

相關概念