Pose Estimation: Definition & Meaning — AI Wiki

透過定位關鍵解剖點 — 關節、臉部 landmark、指尖 — 在影像或影片中偵測人體(或動物、手、臉)的位置和朝向。輸出是一個骨架:表示身體姿勢的一組連接的 keypoint。OpenPose、MediaPipe 和 YOLO-Pose 是流行的實作。

為什麼重要

姿勢估計使下列應用成為可能:分析運動姿勢的健身 app、手語辨識、動畫的動作捕捉、手勢控制介面、體育分析、以及老年照護的跌倒偵測。在 AI 影像生成中,姿勢骨架作為 ControlNet 輸入 — 你指定想要的精確身體姿勢,模型就生成那個姿勢的人物。

Deep Dive

The task: given an image, predict 2D coordinates (x, y) for each keypoint (17 for body: nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles). Top-down approaches first detect people (bounding boxes), then estimate pose within each box. Bottom-up approaches detect all keypoints first, then group them into individuals. Top-down is more accurate for few people; bottom-up is faster for crowds.

3D Pose

2D pose gives (x, y) in image coordinates. 3D pose estimates (x, y, z) in real-world coordinates, enabling depth perception (is the hand reaching toward or away from the camera?). 3D pose is essential for motion capture, VR/AR, and robotics. Models like MotionBERT and 4DHumans estimate 3D pose from a single 2D image by leveraging learned priors about human body proportions and physics.

Beyond Body Pose

Hand pose estimation tracks 21 keypoints per hand, enabling gesture recognition and sign language understanding. Face landmark detection tracks 468+ points for expression analysis, face filters, and emotion recognition. Animal pose estimation adapts the same techniques to quadrupeds, enabling wildlife research and veterinary applications. MediaPipe (Google) provides real-time solutions for body, hand, and face pose that run on mobile devices.

Pose Estimation

為什麼重要

Deep Dive

3D Pose

Beyond Body Pose

相關概念