增强人体关键点特征的姿态估计算法
Pose Estimation Algorithm with Enhanced Human Keypoint Features
-
摘要: 二维多人姿态估计是计算机视觉领域中一项具有挑战性的任务, 其中基于回归的单阶段方法大多缺乏对多人姿态特征学习的针对性, 对人体关节结构特征提取能力不足和关键点位置特征融合能力不足. 针对上述人体关键点特征约束能力不足的问题进行改进, 提出一种增强人体关键点特征的算法. 首先基于MixFormer思想给出多头自注意力机制和深度卷积并行的策略, 以增强人体姿态的局部和全局特征, 在获取高质量视觉表征的同时学习更多的人体关节结构信息; 然后给出坐标注意力机制和空洞空间卷积池化金字塔串行融合策略, 先将人体姿态特征进行拆分来捕获跨通道信息, 再采用空洞卷积扩大感受野, 减少小目标信息的丢失, 提高多尺度特征融合能力; 最后结合YOLO-Pose算法进行姿态估计. 在2个数据集上进行实验的结果表明, COCO2017数据集上, 所提算法的AP值提高0.9个百分点, AR值提高1.2个百分点; OC_Human遮挡数据集上, AP值提高2.3个百分点; 该算法可以在保证运行速度的同时增强人体关键点特征, 提高整体性能.Abstract: 2D multi-person pose estimation is a challenging task in computer vision. Some regression-based single-stage methods lack pertinence for learning multi-person pose representation, which leads to insufficient feature extraction capability for the human joint structure and feature fusion capability for the keypoint location. This paper improves the problem of insufficient constraint for the keypoint features and proposes an algorithm with enhanced human keypoint features. Firstly, based on the idea of MixFormer, the strategy of the multi-head self-attention mechanism and depthwise convolution parallelism is proposed, which enhances the local and global features of the human pose simultaneously. High-quality visual representations can be obtained while learning more human joint structure information. Second, the strategy of serial fusion of the coordinate attention mechanism and atrous spatial pyramid pooling is proposed. The human pose features are split to capture cross-channel information, and then the atrous convolution is used to expand the receptive field, thus reducing the loss of small object information and enhancing the multi-scale feature fusion ability. Finally, the YOLO-Pose is combined for multi-person pose estimation. Experiments on the two data sets show that the algorithm in this paper improves the AP by 0.9 percentage point and the AR by 1.2 percentage point on the COCO2017 data sets. The AP on the OC_Human crowded data sets is improved by 2.3 percentage point. With the guarantee of the running speed unchanged, the human keypoint features are enhanced and the overall performance is improved.