Object Detection: Definition & Meaning — AI Wiki

在影像或影片中辨識並定位物件,給它們畫 bounding box 並分類每個框裡裝著什麼。「在位置 (x1,y1,x2,y2) 有一輛車,在 (x3,y3,x4,y4) 有一個人。」不像影像分類(說影像裡有什麼),物件偵測說影像裡有什麼、在哪裡 — 使計數、追蹤、空間推理成為可能。

為什麼重要

物件偵測是自動駕駛汽車(偵測行人、車輛、標誌)、安防攝影機(人員偵測)、零售分析(計數顧客)、製造業品質控制(偵測缺陷)和擴增實境(將虛擬物件放置到真實物件上)背後的技術。它是商業部署最廣泛的電腦視覺能力之一。

Deep Dive

The YOLO (You Only Look Once) family is the most popular real-time object detection architecture. YOLO divides the image into a grid, predicts bounding boxes and class probabilities for each grid cell in a single forward pass, and filters overlapping detections. YOLOv8 and YOLO-World achieve real-time detection (30+ FPS) with high accuracy on consumer hardware. The alternative, two-stage detectors (like Faster R-CNN), are more accurate but slower.

Beyond Bounding Boxes

Bounding boxes are rectangles — they approximate object location but include background. Instance segmentation (Mask R-CNN, SAM) produces pixel-level masks for each object. Panoptic segmentation labels every pixel as either a specific object instance or a background class. Keypoint detection identifies specific points on objects (joints on a human body for pose estimation). Each adds precision at the cost of compute.

Zero-Shot Detection

Traditional object detectors only find objects from their training categories. Zero-shot detectors (Grounding DINO, OWL-ViT, YOLO-World) can find any object described in natural language: "find all coffee cups" works even if the model never trained on coffee cups. This is possible because these models combine vision and language understanding, matching text descriptions to image regions. It's transformative for applications where the objects of interest change frequently.

Object Detection

為什麼重要

Deep Dive

Beyond Bounding Boxes

Zero-Shot Detection

相關概念