基于RNN和注意力机制的双向人体姿态补全方法

杨韫韬; 聂勇伟; 张青; 李平; 李桂清

doi:10.3724/SP.J.1089.2022.19196

基于RNN和注意力机制的双向人体姿态补全方法

Bi-Directional Human Pose Completion Based on RNN and Attention Mechanism

摘要

摘要: 为解决监控视频中行人相互遮挡问题，提出了一种人体姿态序列补全方法.该方法在给定两端可见的姿态序列时，能生成中间缺失的姿态序列，主要包含以下过程：（1）采用基于注意力机制的序列到序列模型，以给定的两端可见的姿态序列为输入，输出中间的目标姿态序列；（2）同样采用上述基于注意力机制的序列到序列模型，但以逆序的两端姿态序列为输入，输出中间的目标姿态序列；（3）将双向预测结果进行混合后得到最终的目标序列结果.所提方法很好地解决了在前后姿态已知的情况被遮挡行人姿态的恢复问题.在Human3.6M，CASIA等数据集上进行了测试，并以生成的姿态数据与真值之间的L₂误差作为评价指标，与相关方法对比，平均补全误差由81.6降低至42.5，性能提升了47.9%.

Abstract: To tackle the occlusion problems between pedestrians in surveillance videos, a human pose sequence completion method is proposed. The method can generate the missing poses in the middle when poses before and after them are visible. The method includes the following steps:(1) A sequence-to-sequence model based on attention mechanism is used to generate the target pose sequence in the middle by taking the visible poses at both ends as input. (2) The same sequence-to-sequence model based on attention mechanism is used to generate the target pose sequence again, but in a reverse direction. (3) The two-direction prediction results are mixed together to obtain the final target pose sequence. The proposed method handles well the poses recovery problem due to occlusions when the poses before and after the target ones are known. The proposed method is tested on Human3.6M, CASIA datasets, etc. The L₂ norm distance between the generated poses and their ground truth is used as the evaluation metric. Compared with previous approach, the average poses error is reduced from 81.6 to 42.5, and the performance is increased by 47.9%.

HTML全文

参考文献(34)

施引文献

资源附件(0)