Spatio-Temporal Attention Deep Network for Skeleton Based View-Invariant Human Action Recognition
-
Graphical Abstract
-
Abstract
In view of the problems of noise and view dependency in single view skeleton data,a deep network based on spatio-temporal attention model is proposed for recognition of view-independent skeleton behavior.The deep network consists of multiple view-specific sub-networks and a common sub-network.Firstly,each view-specific sub-network extracts the view discriminative features,and it combines a spatial attention module and a temporal attention module to focus on key joints and key frames.Then,the discriminative features are used as the input of the common sub-network to learn the view-invariant features;Finally,the deep network outputs the action classification results.Experiments show that the model achieves 76.3%recognition accuracy on the current largest NTU action recognition dataset.
-
-