高级检索

基于伪时空图卷积网络修复姿态引导的Transformer行人视频修复方法

Transformer-based Pedestrian Video Inpainting Guided by Pseudo-Spatiotemporal Pose Correction Graph Convolutional Networks

  • 摘要: 为解决监控视频中被遮挡行人的修复问题, 提出了一种基于人体姿态的行人视频修复方法, 即先修复视频中残缺的行人姿态序列, 然后在修补后的姿势序列的引导下去修复视频帧中人体的缺失部分. 该方法采用OpenPose从视频中提取被遮挡的人体姿态序列, 针对其因存在遮挡情况导致未识别出和未准确识别部分关节点的问题,提出了一种伪时空图卷积网络模型对缺失姿态进行修复, 得到一个相对准确的姿态序列; 基于修复后的姿态, 提出了基于姿态序列引导的Transformer行人视频修复模型. 在Human3.6M数据集上进行了测试, 所提出的方法在4个指标PSNR, RMSE, SSIM, LPIPS上均比对比方法有提升, 特别是RMSE指标提升了9.50%, LPIPS指标提升了21.67%.

     

    Abstract: In order to solve the problem of repairing occluded pedestrians in surveillance videos, a pedestrian video inpainting method based on human pose is proposed, which repairs the incomplete pedestrian pose sequence at first, and then inpaints the video frames under the guidance of the repaired pose sequence. Firstly, the proposed method uses OpenPose to extract the occluded human pose sequence from the video. Due to occlusions, some joints of the extracted poses may be unrecognized or inaccurately recognized. We thus propose a pseudo-spatiotemporal graph convolutional network to repair the extracted poses and obtain an accurate pose sequence. We then propose a Transformer-based pedestrian video repair model guided by the repaired pose sequence. Tested on the Human3.6M dataset, the proposed method is better than previous approaches in terms of four metrics including PSNR, RMSE, SSIM, and LPIPS. Especially, RMSE is improved by 9.50%, and LPIPS is improved by 21.67%.

     

/

返回文章
返回