Pose Estimation Algorithm with Enhanced Human Keypoint Features
-
Graphical Abstract
-
Abstract
2D multi-person pose estimation is a challenging task in computer vision. Some regression-based single-stage methods lack pertinence for learning multi-person pose representation, which leads to insufficient feature extraction capability for the human joint structure and feature fusion capability for the keypoint location. This paper improves the problem of insufficient constraint for the keypoint features and proposes an algorithm with enhanced human keypoint features. Firstly, based on the idea of MixFormer, the strategy of the multi-head self-attention mechanism and depthwise convolution parallelism is proposed, which enhances the local and global features of the human pose simultaneously. High-quality visual representations can be obtained while learning more human joint structure information. Second, the strategy of serial fusion of the coordinate attention mechanism and atrous spatial pyramid pooling is proposed. The human pose features are split to capture cross-channel information, and then the atrous convolution is used to expand the receptive field, thus reducing the loss of small object information and enhancing the multi-scale feature fusion ability. Finally, the YOLO-Pose is combined for multi-person pose estimation. Experiments on the two data sets show that the algorithm in this paper improves the AP by 0.9 percentage point and the AR by 1.2 percentage point on the COCO2017 data sets. The AP on the OC_Human crowded data sets is improved by 2.3 percentage point. With the guarantee of the running speed unchanged, the human keypoint features are enhanced and the overall performance is improved.
-
-