融合卷积CR-FFD与偏置Transformer胶囊网络的单视图三维物体重建
Single-View 3D Object Reconstruction Based on CR-FFD and Offset Transformer Capsule Network
-
摘要: 针对复杂拓扑结构物体单视图三维重建过程中二维图像与三维形状之间难以准确映射的问题, 提出一种融合卷积 Catmull-Rom 样条自由形变 (CR-FFD) 与偏置 Transformer 胶囊网络的单视图三维重建方法 . 首先通过Catmull-Rom 样条基函数对点云模型控制点进行插值, 保持点云模型形变局部拓扑结构的一致性; 然后提出卷积神经网络最小二乘求逆解法, 通过非线性参数映射加速求解过程; 最后设计偏置注意力 Transformer 胶囊网络增强局部特征表达能力, 捕获点云形状的细粒度特征. 实验结果表明, 在 ShapeNet 数据集上, 所提方法的 EMD 指标平均值为3.84, CD 指标平均值为 3.71; 在 Pix3D 数据集上, EMD 指标平均值为 5.51, CD 指标平均值为 5.39; 与已有的单视图点云三维重建方法相比, 该方法有效地提升单视图的三维重建结果, 能够从不同角度保持重建的一致性.Abstract: Aiming at the difficulty of accurate mapping between 2D images and 3D shapes during single-view 3D reconstruction of complex topological objects, a novel single-view 3D reconstruction method combining convolution Catmull-Rom spline free-form deformation (CR-FFD) with offset transformer capsule network is proposed. Firstly, the control points of the 3D point cloud model are interpolated by the basis functions of Catmull-Rom spline to maintain the consistency of the local topological structure during the deformation process. Then, the least square method based on convolutional neural network is proposed to accelerate the calculation process through nonlinear parameter mapping. Finally, an offset attention Transformer capsule network is designed to enhance the local feature expression ability and capture the fine-grained features of point cloud shape. Experiments on ShapeNet dataset show that the average values of EMD and CD are 3.84 and 3.71, respectively. The average values of EMD and CD on Pix3D dataset are 5.51 and 5.39, respectively. Compared with the existing single-view point cloud 3D reconstruction methods, the proposed method can effectively improve the quality of single-view 3D reconstruction and maintain the consistency of reconstruction from different angles.