高级检索

多人三维姿态估计中的绝对和相对深度融合

Absolute and Relative Depth Fusion for 3D Multi-Person Pose Estimation

  • 摘要: 针对单目图像输入的多人三维姿态估计中尺度表征不精确、姿态恢复不准确和深度融合不充分的问题, 提出多人三维姿态估计中的绝对和相对深度融合方法. 首先采用人体目标检测生成多个人体实例, 表征得到双根关节点二维坐标, 并以颈部和盆骨坐标为引导, 提取绝对深度特征进行多人绝对深度估计; 然后基于扩散模型构建相对深度估计模块, 提取所有单人关节点间的相对深度信息和空间关系, 得到多个单人根相对深度和相对三维姿态; 最后结合坐标级联和透视相机模型, 对多人绝对根深度和单人根相对三维姿态进行多人三维姿态融合, 生成最终的多人三维姿态. 实验结果表明, 在Human3.6M, MuPoTs-3D数据集上, 所提方法的平均每关节误差较现有方法降低3.7%, 三维正确关节占比分别提高2.2和2.5个百分点, 能够生成精确的多人三维姿态估计结果; 在COCO数据集的定性结果, 也说明该方法具有较好的泛化性.

     

    Abstract: To address the problem of inaccurate scale representation, imprecise pose recovery, and inadequate depth fusion in 3D multi-person pose estimation from monocular images, an absolute and relative depth fusion method is proposed. Firstly, human instance detection was employed to generate multiple human instances and extract the 2D coordinates of dual-root joints. Guided by neck and pelvis coordinates, absolute depth features were extracted for multi-person absolute depth estimation. Then, a diffusion model-based relative depth estimation module was constructed to capture relative depth information and spatial relationships between single-person joints, obtaining multiple single-person root-relative depths and relative 3D poses. Finally, the coordinate cascade and perspective camera model were combined to perform 3D pose fusion on the absolute root depths and root-relative 3D poses of multiple people, generating final 3D multi-person pose. The experimental results show that the proposed method reduces the mean per joint position error by 3.7% on Human3.6M and MuPoTs-3D datasets compared with existing methods, and the proportion of correct 3D keypoints is increased by 2.2 and 2.5 percentage points. It yields precise 3D multi-person pose estimation results, and the qualitative results on the COCO dataset also show that the method has robust generalization.

     

/

返回文章
返回