高级检索
何雨霖, 彭淑娟, 柳欣, 崔振. 结合双向混合约束和弹性验证机制的跨模态动作匹配[J]. 计算机辅助设计与图形学学报, 2023, 35(4): 503-515. DOI: 10.3724/SP.J.1089.2023.19395
引用本文: 何雨霖, 彭淑娟, 柳欣, 崔振. 结合双向混合约束和弹性验证机制的跨模态动作匹配[J]. 计算机辅助设计与图形学学报, 2023, 35(4): 503-515. DOI: 10.3724/SP.J.1089.2023.19395
He Yulin, Peng Shujuan, Liu Xin, and Cui Zhen. Efficient Cross-Modal Action Matching via Bidirectional Hybrid Constraint and Elastic Verification Mechanism[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(4): 503-515. DOI: 10.3724/SP.J.1089.2023.19395
Citation: He Yulin, Peng Shujuan, Liu Xin, and Cui Zhen. Efficient Cross-Modal Action Matching via Bidirectional Hybrid Constraint and Elastic Verification Mechanism[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(4): 503-515. DOI: 10.3724/SP.J.1089.2023.19395

结合双向混合约束和弹性验证机制的跨模态动作匹配

Efficient Cross-Modal Action Matching via Bidirectional Hybrid Constraint and Elastic Verification Mechanism

  • 摘要: 由于视频骨骼数据的复杂性及语义鸿沟问题,现有的动作匹配方法无法较好地解决不同模态运动数据间的关联匹配问题.为此,提出一个面向RGB视频-三维骨骼数据的跨模态动作匹配学习方法.首先,设计跨模态动作匹配框架,挖掘RGB视频数据和骨骼序列数据间的共同语义信息;其次,引入权值共享的多模态双层残差结构和双向混合约束,用于挖掘模态间关联,从而生成共享语义嵌入的跨模态表示,极大地提高数据利用率和提升模型的性能;最后,提出弹性验证模块,促使网络在共享语义空间中专注于鉴别性动作特征的学习,有效地提升模型的泛化性能.实验结果表明,该框架可以更加有效地解决RGB视频和骨骼序列2个模态间的动作匹配任务,并在NTU-RGBD和JHMDB数据集上的跨模态ACC和MAP定量分析指数方面均优于现有3种基准算法,较好地实现了异构模态动作间的灵活跨越.

     

    Abstract: The existing action matching methods cannot well solve the semantic correlation matching problem between the heterogeneous video and skeleton motion data, mainly due to the data complexity and their semantic gap. To tackle these issues, this paper presents an efficient cross-modal action matching algorithm for semantically linking the RGB Video and 3D skeleton data. Firstly, an efficient cross-modal action matching framework is carefully designed to mine the common semantic information between the RGB video data and skeleton motion data. Secondly, the dual-residual layer structure and bi-directional hybrid constraint are well employed to learn the cross-modal associations and the corresponding shared representations, featuring on greatly improving the data utilization and enhancing the model performance. Finally, an elastic verification module is effectively designed to learn the discriminative action units within the designed network. The experimental results show that the proposed framework can effectively solve the task of cross-modal action matching between the heterogeneous RGB video and skeleton sequence, and show its outstanding performance on the NTU-RGBD and JHMDB datasets, in terms of higher ACC and MAP values.

     

/

返回文章
返回