图像语义特征引导与点云跨模态融合的三维目标检测方法
3D Object Detection Method with Image Semantic Feature Guidance and Cross-Modal Fusion of Point Cloud
-
摘要: 受到场景的复杂性和目标尺度变化、遮挡等影响, 三维目标检测仍面临着诸多挑战. 虽然跨模态特征融合图像和激光点云信息能够有效地提升三维目标检测性能, 但在融合效果和检测性能上仍有待提升, 为此, 提出图像语义特征引导与点云跨模态融合的三维目标检测方法. 首先设计图像语义特征学习网络, 采用双分支自注意力并行计算方式, 实现全局语义特征增强, 降低目标错误分类; 然后提出图像语义特征引导的局部融合模块, 采用元素级数据拼接将检索的图像局部语义特征引导融合点云数据, 更好地解决跨模态信息融合存在的语义对齐问题; 提出多尺度再融合网络, 设计融合特征与激光雷达点云交互模块, 学习融合特征和不同分辨率特征间的再融合, 提高网络的检测性能; 最后采用四种任务损失实现anchor-free的三维目标检测. 在KITTI和nuScenes数据集上的实验结果表明, 方法优于其他先进的方法, 具有更优的三维检测性能.Abstract: Due to the complexity of scenes, the influence of object scale changes and occlusions etc., object detection still face many challenges. Cross-modal feature fusion of image and laser point cloud information can effectively improve the performance of 3D object detection, but the fusion effect and detection performance still need to be improved. Therefore, this paper first designs an image semantic feature learning network, which adopts a position and channel dual-branch self-attention parallel computing method, achieves global semantic enhancement, to reduce target misclassification. Secondly, a local semantic fusion module with image semantic feature guidance is proposed, which uses element-level data splicing to guide and fuse point cloud data with the local semantic features of the retrieved images, so as to better solve the problem of semantic alignment in cross-modal information fusion. A multi-scale re-fusion network is proposed, and the interaction module between the fusion features and LiDAR is designed to learn multi-scale connections in fusion features and re-fusion between features of different resolutions, so as to improve the detection performance. Finally, four task losses are adopted to perform anchor-free 3D multi-object detector. The experimental results on KITTI and nuScenes datasets show that the proposed detection method outperforms existing state-of-the-art methods and has better 3D detection performance.