3D Object Detection Method with Image Semantic Feature Guidance and Cross-Modal Fusion of Point Cloud
-
Graphical Abstract
-
Abstract
Due to the complexity of scenes, the influence of object scale changes and occlusions etc., object detection still face many challenges. Cross-modal feature fusion of image and laser point cloud information can effectively improve the performance of 3D object detection, but the fusion effect and detection performance still need to be improved. Therefore, this paper first designs an image semantic feature learning network, which adopts a position and channel dual-branch self-attention parallel computing method, achieves global semantic enhancement, to reduce target misclassification. Secondly, a local semantic fusion module with image semantic feature guidance is proposed, which uses element-level data splicing to guide and fuse point cloud data with the local semantic features of the retrieved images, so as to better solve the problem of semantic alignment in cross-modal information fusion. A multi-scale re-fusion network is proposed, and the interaction module between the fusion features and LiDAR is designed to learn multi-scale connections in fusion features and re-fusion between features of different resolutions, so as to improve the detection performance. Finally, four task losses are adopted to perform anchor-free 3D multi-object detector. Comparing with other methods in KITTI and nuScenes datasets, the detection accuracy for 3D objects is 87.15%, and the experimental results show that the method in this paper outperforms the comparison methods and has better 3D detection performance.
-
-