高级检索

基于运动引导图卷积网络的人体动作识别

Motion-Guided Graph Convolutional Network for Human Action Recognition

  • 摘要: 针对当前基于骨架的人体动作识别方法无法建模关节点之间依赖关系随时间的变化, 以及难以实现跨时空信息交互的问题, 提出一种新的运动引导图卷积网络(motion-guided graph convolutional network, M-GCN). 首先, 由骨架序列提取其高级运动特征; 其次, 在时间维度上学习运动相关图, 并通过运动相关图对预定义图和可学习图优化, 建模不同时期的关节依赖关系, 即运动引导拓扑图; 然后, 利用运动引导拓扑图进行空间图卷积, 将运动信息融合到空间图卷积以实现跨时空信息交互; 最后, 交替使用时空图卷积, 实现人体动作识别. 在数据集NTU-RGB+D和NTU-RGB+D 120上与MS-G3D等图卷积方法对比分析, 结果表明, 所提方法在NTU-RGB+D的cross subject和cross view上的准确率分别提升到92.3%和96.7%, 在NTU-RGB+D 120的cross subject和cross setup上的准确率分别提升到88.8%和90.2%.

     

    Abstract: The current skeleton-based human action recognition methods cannot model the changes in the dependence between joints over time, and the interaction of cross space-time information. To solve these problems, a novel motion-guided graph convolutional network (M-GCN) is proposed. Firstly, the high-level motion features are extracted from the skeleton sequence. Secondly, the predefined graphs and the learnable graphs are optimized by the motion-dependent correlations on the time dimension. And the different joint dependencies, i.e., the motion-guided topologies, are captured along the time dimension. Thirdly, the motion-guided topologies are used for spatial graph convolutions, and motion information is fused into spatial graph convolutions to realize the interaction of spatial-temporal information. Finally, spatial-temporal graph convolutions are applied alternately to implement precise human action recognition. Compared with the graph convolution method such as MS-G3D on the dataset NTU-RGB+D and the dataset NTU-RGB+D 120, the results show that the accuracy of the proposed method on the cross subject and cross view of NTU-RGB+D is improved to 92.3% and 96.7%, respectively, and the accuracy on the cross subject and cross setup of NTU-RGB+D 120 is improved to 88.8% and 90.2%, respectively.

     

/

返回文章
返回