物体检测驱动下的三维室内场景重建方法
3D Indoor Scene Reconstruction Method Driven By Object Detection
-
摘要: 由于室内场景中的物体种类具有多样性、密集性以及物体间存在遮挡导致数据缺失等问题, 增加了计算机对室内场景环境感知与重建的难度. 以复杂的室内点云场景为研究对象, 在充分分析场景物体的语义及实例信息的基础上, 提出一种物体检测驱动下的三维室内场景重建方法. 首先以VoteNet为基础,通过融合卷积池化模块提取更为丰富的局部特征, 并根据注意力机制设计投票权重模块, 增强了对前景特征的关注, 同时利用物体关系模块学习空间关系特征, 以此优化室内物体的检测结果; 然后根据包围盒的重叠交并比提取有效物体点云, 并使用最优模型检索算法从模型库中获得更加准确的匹配模型; 最后以物体的实例信息和模型的检索信息作为输入, 利用周围环境的几何空间约束对模型位姿进行优化, 获得更加合理、精细的室内场景重建结果. 在ScanNet数据集上的实验结果表明, 所提方法的平均精度达到63.8%, 比VoteNet提高7.0个百分点. 该方法能够准确地进行场景重建, 尤其在处理因物体残缺而导致匹配精度较低的问题时, 表现出了良好的鲁棒性和准确性.Abstract: Due to the diversity and density of object types in indoor scenes, as well as issues such as occlusion leading to data loss, the challenge of environmental perception and reconstruction for computers has increased. Focusing on complex indoor point cloud scenes, we propose an method for indoor scene reconstruction driven by object detection. First, we build upon VoteNet, designed a fusion convolutional pooling module to extract richer local features and designing a voting weight module based on an attention mechanism to enhance the attention to foreground features. Additionally, we utilize an object relationship module to learn spatial relationship features, optimizing indoor object detection results. We then extract effective object point clouds based on the Intersection over Union of bounding boxes and employ optimal model retrieval algorithms to obtain more accurate matching models from the model repository. Finally, using the instance information of the objects and the retrieved model information as input, we optimize the model pose through geometric spatial constraints of the surrounding environment, resulting in more reasonable and refined indoor scene reconstruction outcomes. Experimental results on the ScanNet dataset show that the proposed method achieves an average precision of 63.8%, an improvement of 7.0 percentage points over VoteNet. This method demonstrates robust performance and accuracy, particularly in addressing issues related to low matching precision caused by object incompleteness.