Spatial-Temporal Feature Reinforcement Learning for 3D Human Pose and Shape Estimation
-
Graphical Abstract
-
Abstract
To address the problem of insufficient spatial-temporal modeling, complex local dependence and weak es-timation robustness in three-dimensional human pose and shape estimation from single-view video, a spa-tial-temporal feature reinforcement learning method is proposed. Firstly, the global spatial-temporal feature reinforcement module was constructed to extract static features from the input video sequences, and global correlation modeling and global temporal features fusion were conducted for two sub-sequences containing intermediate frame, and the integrated temporal features was obtained. Secondly, the spatial-temporal dual branch encoder composed of graph convolution and self-attention mechanism was designed to model the local dependence of human body for local spatial-temporal feature reinforcement learning, and the refined three-dimensional pose was obtained. Finally, the global-local spatial-temporal feature fusion method based on dual attention mechanism was proposed to fuse the temporal, pose and shape feature, and the final estimated three-dimensional human body mesh was obtained. Experimental results on Human3.6M dataset show that the PA-MPJPE and MPJPE are 36.0 mm and 49.7 mm respectively, which are reduced by 0.6 mm and 1.9 mm compared with the comparison method. The proposed method can improve three-dimensional human pose and shape estimation accuracy, and generate precise and smooth three-dimensional human body. Furthermore, the testing results on 3DPW dataset and Internet videos show that the proposed method also has a certain degree of robustness when facing the challenges of occlusion limb, different background and scene conditions.
-
-