Local Correspondence Aware Method for Cross-Modal Learning on Point Clouds
-
Graphical Abstract
-
Abstract
To address the issue of insufficient exploration of feature complementarity and correlation in cross-modal learning, this paper proposes a novel local correspondence aware method for cross-modal learning on point clouds. Based on a dual-channel learning framework, a local correspondence aware module is designed to compute the local semantic correlation between point cloud features and image features based on a constructed image semantic guidance matrix, enhancing point cloud feature representation through attention-weighted mechanisms. A residual mechanism is also introduced for semantic feature compensation, effectively improving the semantic guidance in cross-modal feature learning. Additionally, a self-supervised cross-modal learning strategy is introduced, incorporating 3D points contrastive learning with 2D image semantic features guidance to achieve both inter-modal and intra-modal fine-grained feature association, thereby enhancing the adaptability of feature learning. Finally, by reconstructing the model in the feature spaces of images and point clouds, and leveraging a joint optimization mechanism of reconstruction loss, contrastive loss, and cross-modal consistency loss, the learning performance of the network is significantly improved. Experimental results demonstrate that the proposed method improves the information interaction in cross-modal learning and enhances the robustness of feature learning through image semantic guidance. Evaluated via linear probing, this method achieves 91.61% (classification) and 86.4% (segmentation) on 3D shape tasks, outperforming the baseline by 5.37 percentage points and 1.2 percentage points on average.
-
-