Advanced Search
Yu ZHANG, Li LIU, XiaoDong FU, LiJun LIU, Wei PENG. Multi-Scale Spatial-Temporal Feature Fusion For 3D Human Pose Estimation[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2023-00041
Citation: Yu ZHANG, Li LIU, XiaoDong FU, LiJun LIU, Wei PENG. Multi-Scale Spatial-Temporal Feature Fusion For 3D Human Pose Estimation[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2023-00041

Multi-Scale Spatial-Temporal Feature Fusion For 3D Human Pose Estimation

  • To address the problem of inaccurate representation, unsmooth results and high computational cost in video-based single person three-dimensional human pose estimation, a multi-scale spatial-temporal feature fusion method is proposed. Firstly, the joint, limb and upper/lower body tokens were defined in spatial domain to represent the spatial multi-scale features of human body using positional embeddings. Secondly, the spatial multi-scale feature fusion module was constructed based on self-attention mechanism and multilayer perceptron to fuse joint, limb and upper/lower body features, obtaining initial pose feature sequence. Lastly, the temporal multi-scale encoding was established for temporal feature fusion to acquire final pose feature sequence, and optimize the generation of refined three-dimensional human pose through temporal decoding. Experimental results on Human3.6M dataset show that the mean per joint position error in protocol 2 and joint velocity errors are 33.6 and 2.4 respectively, which reduce by 2.3% and 4%. The proposed method can improve three-dimensional human pose estimation accuracy and generate precise and smooth results while reducing computational cost. Furthermore, experimental results on HumanEva-I dataset show that the proposed method also has a certain degree of generalization ability.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return