Efficient Cross-Modal Action Matching via Bidirectional Hybrid Constraint and Elastic Verification Mechanism
-
Graphical Abstract
-
Abstract
The existing action matching methods cannot well solve the semantic correlation matching problem between the heterogeneous video and skeleton motion data, mainly due to the data complexity and their semantic gap. To tackle these issues, this paper presents an efficient cross-modal action matching algorithm for semantically linking the RGB Video and 3D skeleton data. Firstly, an efficient cross-modal action matching framework is carefully designed to mine the common semantic information between the RGB video data and skeleton motion data. Secondly, the dual-residual layer structure and bi-directional hybrid constraint are well employed to learn the cross-modal associations and the corresponding shared representations, featuring on greatly improving the data utilization and enhancing the model performance. Finally, an elastic verification module is effectively designed to learn the discriminative action units within the designed network. The experimental results show that the proposed framework can effectively solve the task of cross-modal action matching between the heterogeneous RGB video and skeleton sequence, and show its outstanding performance on the NTU-RGBD and JHMDB datasets, in terms of higher ACC and MAP values.
-
-