Zubnet AI学习Wiki › Image Segmentation
Using AI

Image Segmentation

Semantic Segmentation, SAM, Instance Segmentation
把图像中每个像素分类到一个类别。语义分割按类别给像素打标签(马路、人行道、建筑、天空)。实例分割区分独立物体(人 1、人 2)。全景分割两者都做。Meta 的 SAM(Segment Anything Model)能通过点击或文本 prompt 分割任何物体,不需要任务特定的训练。

为什么重要

分割提供对图像内容最精确的理解。自动驾驶汽车需要像素级的道路边界,不只是 bounding box。医学影像需要精确的肿瘤边界。照片编辑需要精确的物体遮罩来去除背景。SAM 零训练就能分割任何物体的能力,让这个以前专业化的能力对所有人都可及。

Deep Dive

Traditional segmentation models (U-Net for medical images, DeepLab for general scenes) are trained on specific categories and produce fixed-class outputs. They work well within their training domain but can't segment novel objects. SAM (Kirillov et al., 2023, Meta) changed this by training on 1 billion masks across 11 million images, learning a general notion of "objectness" that transfers to any domain without fine-tuning.

SAM and Its Impact

SAM takes a prompt (a point click, a bounding box, or text) and produces a segmentation mask for the indicated object. It works on images it has never seen, for object types it was never specifically trained on — microscopy images, satellite photos, artwork. SAM 2 extended this to video, maintaining consistent object segmentation across frames. The impact: tasks that previously required domain-specific training and expensive annotation now work out of the box.

Applications

Medical imaging: segmenting tumors, organs, and cells for diagnosis and treatment planning. Autonomous driving: understanding the drivable surface, lane markings, and obstacles at pixel level. Photo/video editing: precise background removal, object selection, and compositing. Agriculture: analyzing crop health from aerial imagery. Robotics: understanding object boundaries for grasping and manipulation.

相关概念

← 所有术语
ESC