面向人体动作识别的局部特征融合时间卷积网络

宋震; 周元峰; 贾金公; 辛士庆; 刘毅

doi:10.3724/SP.J.1089.2020.17934

面向人体动作识别的局部特征融合时间卷积网络

Local Feature Fusion Temporal Convolutional Network for Human Action Recognition

摘要

摘要: 针对3D人体骨架序列动作识别这一问题,提出了一种结合了局部特征融合的时间卷积网络方法.首先,对一个动作中整个骨架序列的所有关节点的空间位置变化进行建模,提取其骨架序列的全局空间特征;然后,根据人体关节点及连接关系的拓扑结构将全局空间特征划分为人体局部空间特征,并将得到的局部空间特征分别作为对应TCN的输入,进而学习各关节内部的特征关系;最后,对输出的各部分特征向量进行融合,学习各部分关节之间的协作关系,从而完成对动作的识别.运用该方法在当前最具挑战性的数据集NTU-RGB+D进行了分类识别实验,结果表明,与已有的基于CNN,LSTM以及TCN的方法相比,其在对象交叉(cross-subject)和视图交叉(cross-view)的分类准确率上分别提高到了79.5%和84.6%.

Abstract: Aiming at the problem of action recognition of the three-dimensional human skeleton sequences, a temporal convolutional network(TCN) method combining local feature fusion is proposed. Firstly, the global spatial feature of the skeleton sequence is extracted by modeling all the spatial location changes of the skeleton sequence in an action. Then, according to the topological structure of human body joints and connection relations, the global spatial features are divided into local spatial features of the human body, and the obtained local spatial features are taken as the input of corresponding TCN to learn the internal feature relations of each joint. Finally, the feature vectors of each part of the output are fused to learn the cooperative relationship between the joints of each part, to complete the recognition of the action. Classification and recognition experiments are carried out on the most challenging data set NTU-RGB+D by the proposed method. The results show that compared with the existing methods based on CNN, LSTM and TCN, the classification accuracy of cross-subject and cross-view is improved to 79.5% and 84.6%, respectively.

HTML全文

参考文献(0)

施引文献

资源附件(0)