Pose Estimation: Definition & Meaning — AI Wiki

通过定位关键解剖点 — 关节、面部 landmark、指尖 — 在图像或视频中检测人体(或动物、手、脸)的位置和朝向。输出是一个骨架:表示身体姿势的一组连接的 keypoint。OpenPose、MediaPipe 和 YOLO-Pose 是流行的实现。

为什么重要

姿势估计使下列应用成为可能:分析运动姿势的健身 app、手语识别、动画的动作捕捉、手势控制界面、体育分析、以及老年护理的跌倒检测。在 AI 图像生成中,姿势骨架作为 ControlNet 输入 — 你指定想要的精确身体姿势,模型就生成那个姿势的人物。

Deep Dive

The task: given an image, predict 2D coordinates (x, y) for each keypoint (17 for body: nose, eyes, ears, shoulders, elbows, wrists, hips, knees, ankles). Top-down approaches first detect people (bounding boxes), then estimate pose within each box. Bottom-up approaches detect all keypoints first, then group them into individuals. Top-down is more accurate for few people; bottom-up is faster for crowds.

3D Pose

2D pose gives (x, y) in image coordinates. 3D pose estimates (x, y, z) in real-world coordinates, enabling depth perception (is the hand reaching toward or away from the camera?). 3D pose is essential for motion capture, VR/AR, and robotics. Models like MotionBERT and 4DHumans estimate 3D pose from a single 2D image by leveraging learned priors about human body proportions and physics.

Beyond Body Pose

Hand pose estimation tracks 21 keypoints per hand, enabling gesture recognition and sign language understanding. Face landmark detection tracks 468+ points for expression analysis, face filters, and emotion recognition. Animal pose estimation adapts the same techniques to quadrupeds, enabling wildlife research and veterinary applications. MediaPipe (Google) provides real-time solutions for body, hand, and face pose that run on mobile devices.

Pose Estimation

为什么重要

Deep Dive

3D Pose

Beyond Body Pose

相关概念