高级检索
李炫烨, 郝兴伟, 贾金公, 周元峰. 结合多注意力机制与时空图卷积网络的人体动作识别方法[J]. 计算机辅助设计与图形学学报, 2021, 33(7): 1055-1063. DOI: 10.3724/SP.J.1089.2021.18640
引用本文: 李炫烨, 郝兴伟, 贾金公, 周元峰. 结合多注意力机制与时空图卷积网络的人体动作识别方法[J]. 计算机辅助设计与图形学学报, 2021, 33(7): 1055-1063. DOI: 10.3724/SP.J.1089.2021.18640
Li Xuanye, Hao Xingwei, Jia Jingong, Zhou Yuanfeng. Human Action Recognition Method Based on Multi-Attention Mechanism and Spatiotemporal Graph Convolution Networks[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(7): 1055-1063. DOI: 10.3724/SP.J.1089.2021.18640
Citation: Li Xuanye, Hao Xingwei, Jia Jingong, Zhou Yuanfeng. Human Action Recognition Method Based on Multi-Attention Mechanism and Spatiotemporal Graph Convolution Networks[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(7): 1055-1063. DOI: 10.3724/SP.J.1089.2021.18640

结合多注意力机制与时空图卷积网络的人体动作识别方法

Human Action Recognition Method Based on Multi-Attention Mechanism and Spatiotemporal Graph Convolution Networks

  • 摘要: 人体动作识别因其难以结合时空域信息成为计算机视觉方向中一项具有挑战性的任务.提出一个多注意力时空图卷积网络,其核心思想是根据时间序列信息和人体骨架的自然连接构建一个连通图,然后利用具有多注意力机制的时空图卷积网络自动地学习空间和时间特征并且优化该连通图,最后实现对人体动作的预测.引入图注意力模块,模型构建的图的拓扑结构在初始化后会随着网络训练的过程进行优化,最终得到更适合表达人体动作的拓扑结构.此外,加入通道注意力模块,使网络能够更加注意相对重要的通道信息,从而提取更有效描述动作的特征.在公认的大型数据集NTU-RGBD和Kinetics上进行了大量的实验,结果表明该方法具有更高的识别准确率.

     

    Abstract: Human action recognition has become a challenging task in computer vision because it is difficult to combine spatiotemporal information.A multi-attention spatiotemporal graph convolution network is proposed.The core idea is to construct a connected graph according to the time series information and natural connection of human skeleton,and use the spatiotemporal graph convolution network with multi-attention mechanism to automatically learn spatial and temporal features and optimize the connected graph to realize prediction.Graph attention module is introduced,the topological structure of the graph constructed by the model will be optimized with the process of network training after initialization,then the topological structure which is more suitable for expressing human actions will be obtained.In addition,the channel attention module is added to make the network pay more attention to the important channel information,so as to extract the features of describing actions more effectively.A large number of experiments are carried out on the recognized large datasets:NTU-RGDB and Kinectics,which show that the method has higher recognition accuracy.

     

/

返回文章
返回