高级检索

RGB-D双模态信息互补的语义分割网络

RGB-D Dual Modal Information Complementary Semantic Segmentation Network

  • 摘要: 为了充分融合RGB和深度信息以进一步提高语义分割精度,引入注意力机制实现了RGB与深度两个模态特征的互补融合。提出的RGB-D双模态信息互补的语义分割网络基于编码器-解码器框架,编码器采用双分支结构分别提取RGB图像和深度图像的特征, 解码器采用逐层跳跃连接的结构渐进地融合不同粒度的语义信息实现逐像素语义分类. 编码器对两个分支学习到的低层特征, 利用RGB-D信息互补模块进行互补融合, RGB-D信息互补模块包括Depth-guided Attention Module(Depth-AM)和RGB-guided Attention Module(RGB-AM)两种注意力, 其中Depth-AM将深度信息补充给RGB特征, 解决由于光照变化引起的RGB特征不准确问题; RGB-AM将RGB信息补充给深度特征, 解决由于缺乏物体的纹理信息导致的深度特征不准确问题. 在采用相同结构backbone的条件下, 提出的RGB-D双模态信息互补的语义分割网络与RDF-Net相比, 在SUNRGB-D数据集上的平均交并比, 像素精度和平均精度分别提升1.8%, 0.5%和0.7%; 在NYUv2数据集上的平均交并比, 像素精度和平均精度分别提升1.8%, 1.3%和1.9%.

     

    Abstract: In order to fully fuse RGB and depth information to further improve the accuracy of semantic segmentation, attention mechanism is introduced to realize the complementary fusion of RGB and depth modal features. The proposed RGB-D dual modal information complementary semantic segmentation network is designed based on encoder-decoder framework, in which the encoder adopts double branch network structure to extract the feature map of RGB image and depth image respectively, and the decoder adopts the structure of layer-by-layer skip connection to gradually integrate semantic information with different granularity to realize pixel-level semantic classification. For the features leaned in the lower layer, the encoder utilizes a RGB-D information complementary module to mutually fuse the feature from one modal to the other modal. The RGB-D information complementary module includes two kinds of attentions, Depth-guided Attention Module (Depth-AM) and RGB-guided Attention Module (RGB-AM). The Depth-AM takes the original depth information as the supplement of RGB features to solve the problem of inaccurate RGB features caused by illumination changes, and the RGB-AM takes the RGB feature as the supplementary information of depth feature to solve the problem of inaccurate depth feature caused by the lack of object texture information. Under the condition of utilizing backbone with same structure, compared with RDF-Net, the proposed RGB-D dual modal information complementary semantic segmentation network has obvious improvements. In details, the mIoU, pixel accuracy and mean pixel are improved by 1.8%, 0.5% and 0.7% on SUNRGB-D dataset, the mIoU, pixel accuracy and mean pixel are improved by 1.8%, 1.3% and 1.9% on NYUv2 dataset.

     

/

返回文章
返回