Object Detection: Definition & Meaning — AI Wiki

在图像或视频中识别并定位物体,给它们画 bounding box 并分类每个框里装着什么。“在位置 (x1,y1,x2,y2) 有一辆车,在 (x3,y3,x4,y4) 有一个人。”不像图像分类(说图像里有什么),物体检测说图像里有什么、在哪里 — 使计数、追踪、空间推理成为可能。

为什么重要

物体检测是自动驾驶汽车(检测行人、车辆、标志)、安防摄像头(人员检测)、零售分析(计数顾客)、制造业质量控制(检测缺陷)和增强现实(将虚拟物体放置到真实物体上)背后的技术。它是商业部署最广泛的计算机视觉能力之一。

Deep Dive

The YOLO (You Only Look Once) family is the most popular real-time object detection architecture. YOLO divides the image into a grid, predicts bounding boxes and class probabilities for each grid cell in a single forward pass, and filters overlapping detections. YOLOv8 and YOLO-World achieve real-time detection (30+ FPS) with high accuracy on consumer hardware. The alternative, two-stage detectors (like Faster R-CNN), are more accurate but slower.

Beyond Bounding Boxes

Bounding boxes are rectangles — they approximate object location but include background. Instance segmentation (Mask R-CNN, SAM) produces pixel-level masks for each object. Panoptic segmentation labels every pixel as either a specific object instance or a background class. Keypoint detection identifies specific points on objects (joints on a human body for pose estimation). Each adds precision at the cost of compute.

Zero-Shot Detection

Traditional object detectors only find objects from their training categories. Zero-shot detectors (Grounding DINO, OWL-ViT, YOLO-World) can find any object described in natural language: "find all coffee cups" works even if the model never trained on coffee cups. This is possible because these models combine vision and language understanding, matching text descriptions to image regions. It's transformative for applications where the objects of interest change frequently.

Object Detection

为什么重要

Deep Dive

Beyond Bounding Boxes

Zero-Shot Detection

相关概念