Abstract:
Human action recognition plays a vital role in video understanding. In recent years, skeleton-based action recognition approaches have gained widespread attention due to their robustness against environmental interferences. This paper compiles 100 skeleton-based human action recognition methods and comparatively analyzes their performance on nine public datasets. This paper introduces the manual feature and deep learning based methods according to learning paradigms. Specifically, the manual feature methods are divided into three categories, i.e., geometric, kinetic, and statistical representations, in the light of feature descriptor. Meanwhile, the deep learning based methods are classified into five subclasses by backbones, i.e., recurrent neural networks, convolutional neural networks, graph convolutional networks, transformer, and hybrid networks. Through comprehensive analysis, we not only present the research status of skeleton-based action recognition but also summarize the challenges and future works, which will promote the research in this field significantly.