Abstract:
3D object detection is a critical task in the field of autonomous driving., Existing methods often suffer from complex architectures and high computational costs in cross-modal feature interaction, which limits the application range of methods. To address these challenges, proposes a cross-modal feature interaction and fusion-based 3D object detection method. Specifically, a bird’s-eye-view-based spatial feature alignment module is designed to achieve cross-modal feature alignment, thereby enhancing detection accuracy. To obtain a more comprehensive feature representation, introduce a fusion strategy that combines multi-scale large kernel convolutions with a cross-modal feature selection branch, making it more efficient and concise compared to traditional explicit alignment methods. Experimental results on the nuScenes dataset demonstrate that, compared with the baseline method, our approach achieves an improvement of 0.2 in nuScenes detection score (NDS) and 0.6 in mean Average Precision (mAP), validating its effectiveness in multimodal 3D object detection tasks.