高级检索
丰艳, 李鸽, 原春锋, 王传旭. 基于时空注意力深度网络的视角无关性骨架行为识别[J]. 计算机辅助设计与图形学学报, 2018, 30(12): 2271-2277. DOI: 10.3724/SP.J.1089.2018.17095
引用本文: 丰艳, 李鸽, 原春锋, 王传旭. 基于时空注意力深度网络的视角无关性骨架行为识别[J]. 计算机辅助设计与图形学学报, 2018, 30(12): 2271-2277. DOI: 10.3724/SP.J.1089.2018.17095
Feng Yan, Li Ge, Yuan Chunfeng, Wang Chuanxu. Spatio-Temporal Attention Deep Network for Skeleton Based View-Invariant Human Action Recognition[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(12): 2271-2277. DOI: 10.3724/SP.J.1089.2018.17095
Citation: Feng Yan, Li Ge, Yuan Chunfeng, Wang Chuanxu. Spatio-Temporal Attention Deep Network for Skeleton Based View-Invariant Human Action Recognition[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(12): 2271-2277. DOI: 10.3724/SP.J.1089.2018.17095

基于时空注意力深度网络的视角无关性骨架行为识别

Spatio-Temporal Attention Deep Network for Skeleton Based View-Invariant Human Action Recognition

  • 摘要: 针对单视角骨架数据包含噪声且其特征完全依赖于该视角的问题,提出一种基于时空注意力的深度网络模型进行角度无关性骨架行为识别,该模型主要由特定视角子网和公共子网串联组成.首先通过特定视角子网学习每个视角序列的判别性特征,同时利用空域注意力和时域注意力模块分别重点关注关键关节点和关键帧;然后特定视角子网的输出特征作为公共子网的输入,通过公共子网进一步学习角度无关性特征;最后输出行为分类结果.为了保证网络的有效训练,提出一个正则化交叉熵损失函数来推动网络多模块共同学习.实验结果表明,该模型在目前最大的骨架数据集NTU数据集上识别准确率为76.3%.

     

    Abstract: In view of the problems of noise and view dependency in single view skeleton data,a deep network based on spatio-temporal attention model is proposed for recognition of view-independent skeleton behavior.The deep network consists of multiple view-specific sub-networks and a common sub-network.Firstly,each view-specific sub-network extracts the view discriminative features,and it combines a spatial attention module and a temporal attention module to focus on key joints and key frames.Then,the discriminative features are used as the input of the common sub-network to learn the view-invariant features;Finally,the deep network outputs the action classification results.Experiments show that the model achieves 76.3%recognition accuracy on the current largest NTU action recognition dataset.

     

/

返回文章
返回