高级检索
杨韫韬, 聂勇伟, 张青, 李平, 李桂清. 基于RNN和注意力机制的双向人体姿态补全方法[J]. 计算机辅助设计与图形学学报, 2022, 34(11): 1772-1783. DOI: 10.3724/SP.J.1089.2022.19196
引用本文: 杨韫韬, 聂勇伟, 张青, 李平, 李桂清. 基于RNN和注意力机制的双向人体姿态补全方法[J]. 计算机辅助设计与图形学学报, 2022, 34(11): 1772-1783. DOI: 10.3724/SP.J.1089.2022.19196
Yang Yuntao, Nie Yongwei, Zhang Qing, Li Ping, Li Guiqing. Bi-Directional Human Pose Completion Based on RNN and Attention Mechanism[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(11): 1772-1783. DOI: 10.3724/SP.J.1089.2022.19196
Citation: Yang Yuntao, Nie Yongwei, Zhang Qing, Li Ping, Li Guiqing. Bi-Directional Human Pose Completion Based on RNN and Attention Mechanism[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(11): 1772-1783. DOI: 10.3724/SP.J.1089.2022.19196

基于RNN和注意力机制的双向人体姿态补全方法

Bi-Directional Human Pose Completion Based on RNN and Attention Mechanism

  • 摘要: 为解决监控视频中行人相互遮挡问题,提出了一种人体姿态序列补全方法.该方法在给定两端可见的姿态序列时,能生成中间缺失的姿态序列,主要包含以下过程:(1)采用基于注意力机制的序列到序列模型,以给定的两端可见的姿态序列为输入,输出中间的目标姿态序列;(2)同样采用上述基于注意力机制的序列到序列模型,但以逆序的两端姿态序列为输入,输出中间的目标姿态序列;(3)将双向预测结果进行混合后得到最终的目标序列结果.所提方法很好地解决了在前后姿态已知的情况被遮挡行人姿态的恢复问题.在Human3.6M,CASIA等数据集上进行了测试,并以生成的姿态数据与真值之间的L2误差作为评价指标,与相关方法对比,平均补全误差由81.6降低至42.5,性能提升了47.9%.

     

    Abstract: To tackle the occlusion problems between pedestrians in surveillance videos, a human pose sequence completion method is proposed. The method can generate the missing poses in the middle when poses before and after them are visible. The method includes the following steps:(1) A sequence-to-sequence model based on attention mechanism is used to generate the target pose sequence in the middle by taking the visible poses at both ends as input. (2) The same sequence-to-sequence model based on attention mechanism is used to generate the target pose sequence again, but in a reverse direction. (3) The two-direction prediction results are mixed together to obtain the final target pose sequence. The proposed method handles well the poses recovery problem due to occlusions when the poses before and after the target ones are known. The proposed method is tested on Human3.6M, CASIA datasets, etc. The L2 norm distance between the generated poses and their ground truth is used as the evaluation metric. Compared with previous approach, the average poses error is reduced from 81.6 to 42.5, and the performance is increased by 47.9%.

     

/

返回文章
返回