基于自监督姿态对齐的关节物体运动视频迁移

姜柳彤; 谢洁璇; 胡瑞珍

doi:10.3724/SP.J.1089.2024-00391

基于自监督姿态对齐的关节物体运动视频迁移

Articulated Motion Video Transfer Based on Self-Supervised Pose Alignment

摘要

摘要: 运动视频迁移是计算机视觉领域的关键任务之一, 然而当前方法存在2方面的不足: 对于随时间变化的运动姿态特征提取的准确性不足; 随时间变化的运动姿态特征和与时间无关的内容特征的解耦合存在缺陷, 导致视频迁移性能不足. 为解决上述问题, 提出一个双阶段网络模型. 第一个阶段中, 使用两个编码器结构分别提取视频中的运动姿态特征和内容特征, 并通过解码器进行特征融合和视频生成, 其中, 采用自监督姿态对齐的方法提取运动特征, 并引入对抗性损失、循环一致性损失和交叉训练的技巧, 提高二者的解耦合效果; 第2个阶段中, 对第1个网络生成的视频进行微调, 专注于提高视频的运动一致性. 通过一系列的对比实验、可视化实验和消融实验, 验证结果表明, 所提模型在视频帧质量、结构相似性和均方误差方面均优于现有方法.

Abstract: Motion video transfer is one of the key tasks in the field of computer vision, but current methods have two shortcomings. Firstly, the accuracy of extracting motion pose features that change over time is insufficient. Secondly, the factorization of time-varying motion pose features and time independent content features is flawed, leading to a decrease in video transfer performance. To address the aforementioned issues, we proposes an innovative two-stage network model. In the first stage, two encoder structures are used to extract motion pose features and content features from the video, and feature fusion and video generation are performed through a decoder. In this stage, self-supervised pose alignment method was adopted to extract motion features, and adversarial loss, cycle consistency loss, and cross training techniques were introduced to improve the factorization effect of both. In the second stage, finetune the video generated by the first network, focusing on improving the motion consistency of the video. Through a series of comparative experiments, visualization expriments, and ablation expriments, the results demonstrate that the proposed model outperforms existing methods in terms of video frame quality, structural similarity, and mean squared error.

HTML全文

参考文献(0)

施引文献

资源附件(0)