Abstract:
Aiming at the problem that the existing 2D-3D correspondence computation methods ignore the asymmetric viewpoints and modalities between the image and the point cloud, and do not fully utilize the depth feature information of the image, which leads to the poor effect of dense correspondence, we propose a novel spatial-consistency asymmetric 2D-3D dense correspondence calculation method based on a coarse-to-fine matching strategy. Initially, the 2D image and 3D point cloud are processed through an image backbone network with a window fully-connected conditional random field (WFC-CRF) module and a point cloud backbone network with positional embedding, respectively. This step is aimed at extracting multi-scale spatial features from both image and point cloud data. Next, in the coarse correspondence phase, multi-scale spatial-consistent matching is performed between image patches and point cloud blocks. The Sinkhorn algorithm is employed to post-process the block-level features, reducing the errors caused by perspective and modality asymmetry, and obtaining coarse block-level correspondences. Finally, the asymmetric 2D-3D dense correspondences at the point level are obtained by using resampling and attention mechanism to refine the correspondences in the fine correspondence stage. The inlier ratio, registration recall rate, and feature recall rate achieve 60.5%, 79.5%, and 93.7% respectively on the 7Scenes dataset, while reaching 37.2%, 63.2%, and 93.5% on the RGB-D Scenes V2 dataset. This approach effectively enhances the accuracy of dense correspondence computation and demonstrates strong generalization capabilities across different scenarios.