高级检索
李晶晶, 黄章进, 邹露. 基于运动引导图卷积网络的人体动作识别[J]. 计算机辅助设计与图形学学报, 2024, 36(7): 1077-1086. DOI: 10.3724/SP.J.1089.2024.19898
引用本文: 李晶晶, 黄章进, 邹露. 基于运动引导图卷积网络的人体动作识别[J]. 计算机辅助设计与图形学学报, 2024, 36(7): 1077-1086. DOI: 10.3724/SP.J.1089.2024.19898
Li Jingjing, Huang Zhangjin, Zou Lu. Motion-Guided Graph Convolutional Network for Human Action Recognition[J]. Journal of Computer-Aided Design & Computer Graphics, 2024, 36(7): 1077-1086. DOI: 10.3724/SP.J.1089.2024.19898
Citation: Li Jingjing, Huang Zhangjin, Zou Lu. Motion-Guided Graph Convolutional Network for Human Action Recognition[J]. Journal of Computer-Aided Design & Computer Graphics, 2024, 36(7): 1077-1086. DOI: 10.3724/SP.J.1089.2024.19898

基于运动引导图卷积网络的人体动作识别

Motion-Guided Graph Convolutional Network for Human Action Recognition

  • 摘要: 针对当前基于骨架的人体动作识别方法无法建模关节点之间依赖关系随时间的变化,以及难以实现跨时空信息交互的问题,提出基于运动引导图卷积网络的人体动作识别方法.首先根据骨架序列提取其高级运动特征;然后在时间维度上学习运动相关图,并通过对预定义图和可学习图优化建模不同时期的关节依赖关系,即运动引导拓扑图;再利用运动引导拓扑图进行空间图卷积,将运动信息融合到空间图卷积以实现跨时空信息交互;最后交替使用时空图卷积,实现人体动作识别.在数据集NTU-RGB+D和NTU-RGB+D 120上与MS-G3D等图卷积网络进行对比实验的结果表明,所提方法在NTU-RGB+D的跨对象和跨视角上的准确率分别提升到92.3%和96.7%,在NTU-RGB+D 120的跨对象和跨场景上的准确率分别提升到88.8%和90.2%.

     

    Abstract: The current skeleton-based human action recognition methods cannot model the changes in the dependence between joints over time, and the interaction of cross space-time information. To solve these problems, a novel motion-guided graph convolutional network (M-GCN) is proposed. Firstly, the high-level motion features are extracted from the skeleton sequence. Secondly, the predefined graphs and the learnable graphs are optimized by the motion-dependent correlations on the time dimension. And the different joint dependencies, i.e., the motion-guided topologies, are captured along the time dimension. Thirdly, the motion-guided topologies are used for spatial graph convolutions, and motion information is fused into spatial graph convolutions to realize the interaction of spatial-temporal information. Finally, spatial-temporal graph convolutions are applied alternately to implement precise human action recognition. Compared with the graph convolution method such as MS-G3D on the dataset NTU-RGB+D and the dataset NTU-RGB+D 120, the results show that the accuracy of the proposed method on the cross subject and cross view of NTU-RGB+D is improved to 92.3% and 96.7%, respectively, and the accuracy on the cross subject and cross setup of NTU-RGB+D 120 is improved to 88.8% and 90.2%, respectively.

     

/

返回文章
返回