RGB-D双模态信息互补的语义分割网络

王立春; 顾娜娜; 信建佳; 王少帆

doi:10.3724/SP.J.1089.2023.19592

RGB-D双模态信息互补的语义分割网络

RGB-D Dual Modal Information Complementary Semantic Segmentation Network

摘要

摘要: 为了充分融合 RGB 和深度信息以进一步提高语义分割精度, 引入注意力机制实现了 RGB 与深度 2 个模态特征的互补融合. 基于编码器-解码器框架, 提出了 RGB-D 双模态信息互补的语义分割网络, 编码器采用双分支结构分别提取 RGB 图像和深度图像的特征, 解码器采用逐层跳跃连接的结构渐进地融合不同粒度的语义信息实现逐像素语义分类. 编码器对 2 个分支学习到的低层特征, 利用 RGB-D 信息互补模块进行互补融合. RGB-D 信息互补模块包括Depth-guided Attention Module (Depth-AM)和RGB-guided Attention Module (RGB-AM) 2种注意力. 其中, Depth-AM将深度信息补充给 RGB 特征, 解决由于光照变化引起的 RGB 特征不准确问题; RGB-AM 将 RGB 信息补充给深度特征, 解决由于缺乏物体的纹理信息而导致的深度特征不准确问题. 在采用相同结构 backbone 的条件下, RGB-D 双模态信息互补的语义分割网络与 RDF-Net 相比, 在 SUNRGB-D 数据集上的平均交并比, 像素精度和平均精度分别提升 1.8%, 0.5%和 0.7%; 在 NYUv2 数据集上的平均交并比, 像素精度和平均精度分别提升 1.8%, 1.3%和 1.9%.

Abstract: In order to fully fuse RGB and depth information to further improve the accuracy of semantic segmentation, attention mechanism is introduced to realize the complementary fusion of RGB and depth modal features. The proposed RGB-D dual modal information complementary semantic segmentation network is designed based on encoder-decoder framework, in which the encoder adopts double branch network structure to extract the feature map of RGB image and depth image respectively, and the decoder adopts the structure of layer-by-layer skip connection to gradually integrate semantic information with different granularity to realize pixel-level semantic classification. For the features leaned in the lower layer, the encoder utilizes an RGB-D information complementary module to mutually fuse the feature from one modal to the other modal. The RGB-D information complementary module includes two kinds of attentions, Depth-guided Attention Module (Depth-AM) and RGB-guided Attention Module (RGB-AM). The Depth-AM takes the original depth information as the supplement of RGB features to solve the problem of inaccurate RGB features caused by illumination changes, and the RGB-AM takes the RGB feature as the supplementary information of depth feature to solve the problem of inaccurate depth feature caused by the lack of object texture information. Under the condition of utilizing backbone with same structure, compared with RDF-Net, the proposed RGB-D dual modal information complementary semantic segmentation network has obvious improvements. In details, the mIoU, pixel accuracy and mean pixel are improved by 1.8%, 0.5% and 0.7% on SUNRGB-D dataset, the mIoU, pixel accuracy and mean pixel are improved by 1.8%, 1.3% and 1.9% on NYUv2 dataset.

HTML全文

参考文献(42)

施引文献

资源附件(0)