高级检索
程德强, 徐帅, 韩成功, 吕晨, 寇旗旗, 张剑英. 基于视觉注意的自监督单目深度估计[J]. 计算机辅助设计与图形学学报.
引用本文: 程德强, 徐帅, 韩成功, 吕晨, 寇旗旗, 张剑英. 基于视觉注意的自监督单目深度估计[J]. 计算机辅助设计与图形学学报.
Deqiang Cheng, Shuai Xu, Chenggong Han, Chen Lü, Qiqi Kou, jianying zhang. Visual Attention-Based Self-Supervised Monocular Depth Estimation[J]. Journal of Computer-Aided Design & Computer Graphics.
Citation: Deqiang Cheng, Shuai Xu, Chenggong Han, Chen Lü, Qiqi Kou, jianying zhang. Visual Attention-Based Self-Supervised Monocular Depth Estimation[J]. Journal of Computer-Aided Design & Computer Graphics.

基于视觉注意的自监督单目深度估计

Visual Attention-Based Self-Supervised Monocular Depth Estimation

  • 摘要: 为解决单目深度估计方法在面对复杂纹理区域时预测效果差的问题, 提出了一种基于视觉注意的自监督单目深度估计方法. 首先构建多尺度源图像输入编码器,从而更好地融合多路特征; 其次, 通过并行的中间注意力模块跨区域交互, 分别在空间维度和通道维度建模语义依赖关系, 得到丰富的上下文信息; 此外, 采用连续的外部注意力特征聚合模块构成解码器部分, 有效利用上下文信息解决复杂区域的不适应问题. 在KITTI和Cityscapes数据集上的实验结果表明, 该方法优于目前主流的方法, 在复杂纹理区域拥有更好的深度预测性能. KITTI数据集的RMS和RMSlog分别达到了4.486和0.181.

     

    Abstract: To solve the problem of poor prediction of complex texture areas with existing monocular depth estimation methods. We propose a visual attention-based self-supervised monocular depth estimation method. Firstly, this method provides better fusion of multiple features by fusing multi-scale source images as input to the encoder; Then through parallel intermediate attention modules interacted across regions, we modelled semantic dependencies in the spatial dimension and channel dimension respectively, to obtain rich contextual information; In addition, the continuous external attentional feature aggregation module was used to form the decoder part, which effectively used contextual information to solve the maladjustment problem in complex regions. Experimental results on KITTI and Cityscapes datasets show that our method is better than the current mainstream methods, with better depth prediction performance in complex texture regions. In the KITTI dataset, RMS and RMSlog reached 4.486 and 0.181, respectively.

     

/

返回文章
返回