高级检索
缪永伟, 陈佳慧, 张新杰, 马文娟, 孙树森. 基于RGB-D视频流的室内环境3D目标高效检测[J]. 计算机辅助设计与图形学学报, 2021, 33(7): 1015-1025. DOI: 10.3724/SP.J.1089.2021.18630
引用本文: 缪永伟, 陈佳慧, 张新杰, 马文娟, 孙树森. 基于RGB-D视频流的室内环境3D目标高效检测[J]. 计算机辅助设计与图形学学报, 2021, 33(7): 1015-1025. DOI: 10.3724/SP.J.1089.2021.18630
Miao Yongwei, Chen Jiahui, Zhang Xinjie, Ma Wenjuan, Sun Shusen. Efficient 3D Object Detection of Indoor Scenes Based on RGB-D Video Stream[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(7): 1015-1025. DOI: 10.3724/SP.J.1089.2021.18630
Citation: Miao Yongwei, Chen Jiahui, Zhang Xinjie, Ma Wenjuan, Sun Shusen. Efficient 3D Object Detection of Indoor Scenes Based on RGB-D Video Stream[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(7): 1015-1025. DOI: 10.3724/SP.J.1089.2021.18630

基于RGB-D视频流的室内环境3D目标高效检测

Efficient 3D Object Detection of Indoor Scenes Based on RGB-D Video Stream

  • 摘要: 由于室内复杂环境RGB-D扫描数据不完整﹑物体相互遮挡等缺陷,以及表示部分场景的单帧数据输入的局限性,导致难以通过一次操作高效检测室内场景中的所有3D目标物体.为了克服难以感知获取室内场景中全部物体信息以及场景3D目标检测效率低等的难点,提出一种基于RGB-D视频流的室内环境3D目标高效检测方法.首先,利用Kinect相机获取待检测的室内环境RGB-D视频流,经预处理得到连续帧彩色图及其对应的扫描点云信息;其次,利用哈希算法从连续帧彩色图中提取内容敏感的视频关键帧,并根据相邻关键帧所包含的物体种类及个数为其构建目标语义关系,以确保各关键帧均出现不一样的目标物体;然后,利用神经网络VoteNet对视频关键帧点云数据进行3D目标检测,并利用四元数球面线性插值算法插值相邻关键帧的相对位姿关系以估计其余帧检测结果,最终实现RGB-D视频流中各帧数据的3D目标检测.使用SUN RGB-D数据集对关键帧检测网络进行训练,与基于VoteNet的视频流逐帧检测方法相比,该方法的目标检测结果准确,同时大大缩短了视频流整体检测耗时.实验结果表明该方法的有效性和高效性.

     

    Abstract: For indoor object detection,the input complex scenes often have some defects such as incomplete RGB-D scanning data or mutual occlusion of its objects.Meanwhile,due to the limitations of single RGB-D data or point cloud data input of indoor scenes,it is always difficult to detect all of 3 D objects simultaneously.In order to overcome this issue and also alleviate its low efficiency for indoor object detection,an efficient 3 D object detection method is proposed which takes RGB-D video streams as input.First,the RGB-D video stream of different indoor environments can be obtained using Kinect camera,and also captured its continuous RGB frames and corresponding point cloud data.Secondly,the Hash function is adopted to extract the content-sensitive key frames from the continuous RGB frames,and the objects semantic relationship can also be constructed according to the type/number of 3 D objects contained in adjacent key frames for ensuring that different objects will appear in each key frame.Then,3 D objects of the extracted key frames can be detected by using VoteNet,and the detection results of other frames can be estimated owing to relative posture relationship between adjacent frames by using the quaternion spherical linear interpolation algorithm.Finally,it can achieve efficient 3 D object detection for each frame in the RGB-D video stream.Using SUN RGB-D dataset to train the object detection network of key frame,the detection result of proposed method is accurate,and the overall detection time is greatly reduced if comparing with the VoteNet based frame-by-frame detection scheme.Experimental results demonstrate that proposed method is effective and efficient.

     

/

返回文章
返回