高级检索

基于时频数据融合的分层图卷积手势识别方法

A Hierarchical Graph Convolutional Method for Gesture Recognition Based on Temporal-Frequency Data Fusion

  • 摘要: 图卷积神经网络善于提取非欧几里得结构特征, 已成为基于骨架手势识别的主流方法. 针对图卷积神经网络方法训练和推理时间较长, 且准确率仍有待提升的问题, 提出一种基于时频数据融合的分层图卷积手势识别方法HHTS-Net. 首先提出一种空间注意力块与图卷积模块相结合的框架, 降低计算负担; 然后提出针对手掌及指关节的分层图卷积模块, 提升特征提取效率; 最后提出一种融合频域特征的多流学习方案, 进一步提升模型的性能. 在公共数据集DHG和SHREC’17上与较权威的方法进行实验, 结果表明, HHTS-Net的准确率至少提升1.8%和0.3%, 推理速度加快47.6%和52.9%以上; 通过大量消融实验, 验证了该方法的有效性. 文中算法的源代码详见https://github.com/CoderHoooK/HTSATNet.

     

    Abstract: Graph Convolutional Neural Networks (GCNs) are adept at extracting non-Euclidean structural features and have become the mainstream method for skeleton-based gesture recognition. In response to the issues of long training and inference times, as well as the need for improved accuracy in GCN methods, a hierarchical graph convolutional gesture recognition method based on the fusion of temporal-frequency data, named HHTS-Net, is proposed. Firstly, a framework that combines spatial attention blocks with graph convolutional modules is proposed to reduce computational burden. Then, a hierarchical graph convolutional module tailored to the structural characteristics of the palm and finger joints is introduced to enhance feature extraction efficiency. Finally, a multi-stream learning scheme that integrates frequency domain features is proposed to further improve model performance. Experiments conducted on public datasets DHG and SHREC’17 with authoritative methods demonstrate that HHTS-Net's accuracy is improved by at least 1.8% and 0.3%, respectively, and inference speed is increased by more than 47.6% and 52.9%. Extensive ablation experiments validate the effectiveness of this method. The source code is available at https://github.com/CoderHoooK/HHTSATNet.

     

/

返回文章
返回