融合图像信息的跨模态Transformer点云补全算法
hexing@nuaa.edu.cn
-
摘要: 针对三维传感器(如LiDAR、深度相机)获取的点云往往残缺不全, 需要进行补全处理, 而单模态技术存在的补全结果细节不丰富、结构不完整等问题, 提出一种融合图像信息的跨模态Transformer点云补全算法, 采用点云分支和图像分支分别提取点云特征和图像特征. 点云分支采用PoinTr为骨干网络, 图像分支采用7层卷积. 通过特征融合模块融合点云特征和图像特征, 由粗到精地生成全分辨率的点云. 实验结果表明, 该算法在数据集ShapeNet-ViPC上的可视化结果优于单模态点云补全技术和目前仅有的跨模态点云补全方法ViPC, 且在大部分测试类别上的CD-L2量化指标优于ViPC, 平均CD-L2为2.74比ViPC低17%. 为了便于研究人员评估和使用, 文中算法可通过: https://github.com/Starak-x/ImPoinTr开源获取Abstract: The point cloud obtained by 3D sensors (such as LiDAR and depth camera) is mostly incomplete and needs to be completed. Aiming at the problems of insufficient details and incomplete structure of single-modal point cloud completion methods, a cross-modal Transformer for point cloud completion is proposed. Point cloud features and image features are extracted by point cloud branch and image branch respectively. Point cloud branch adopts PoinTr as backbone, and image branch adopts 7 convolution layers. The feature fusion module fuses point cloud features and image features together to generate a full resolution point cloud in a coarse-to-fine manner. Experimental results indicate that the visualization of this method is better than the single-modal point cloud completion methods and the cross-modal point cloud completion method ViPC. Moreover, the CD-L2 of this method is better than ViPC on most categories, and the average CD-L2 is 2.74, which is 17% lower than ViPC. Our code is available at: https://github.com/Starak-x/ImPoinTr