融合图像信息的跨模态Transformer点云补全算法

何星; 朱哲; 燕雪峰; 郭延文; 宫丽娜; 魏明强

doi:10.3724/SP.J.1089.2024.19905

融合图像信息的跨模态Transformer点云补全算法

Cross-Modal Transformer for Point Cloud Completion

摘要

摘要: 针对三维传感器(如LiDAR、深度相机)获取的点云往往残缺不全, 需要进行补全处理, 而单模态方法存在的补全结果细节不丰富、结构不完整等问题, 提出一种融合图像信息的跨模态Transformer点云补全算法. 首先采用点云分支和图像分支分别提取点云特征和图像特征, 其中, 点云分支采用PoinTr为骨干网络, 图像分支采用7层卷积; 然后通过特征融合模块融合点云特征和图像特征, 由粗到精地生成全分辨率的点云. 在ShapeNet-ViPC数据集上进行实验的结果表明, 所提算法的可视化结果优于单模态点云补全方法和目前仅有的跨模态点云补全方法ViPC, 且在大部分测试类别上的CD-L₂量化指标优于ViPC; 平均CD-L₂为2.74, 比ViPC低17%. 为了便于研究人员评估和使用, 文中算法可通过https://github.com/Starak-x/ImPoinTr开源获取.

Abstract: The point cloud obtained by 3D sensors (such as LiDAR and depth camera) is mostly incomplete and needs to be completed. Aiming at the problems of insufficient details and incomplete structure of single-modal point cloud completion methods, a cross-modal Transformer for point cloud completion is proposed. Point cloud features and image features are extracted by point cloud branch and image branch respectively. Point cloud branch adopts PoinTr as backbone, and image branch adopts 7 convolution layers. The feature fusion module fuses point cloud features and image features together to generate a full resolution point cloud in a coarse-to-fine manner. Experimental results indicate that the visualization of this method is better than the single-modal point cloud completion methods and the cross-modal point cloud completion method ViPC. Moreover, the CD-L₂ of this method is better than ViPC on most categories, and the average CD-L₂ is 2.74, which is 17% lower than ViPC. Our code is available at: https://github.com/Starak-x/ImPoinTr.

HTML全文

参考文献(30)

施引文献

资源附件(0)