Abstract:
In order to fully fuse RGB and depth information to further improve the accuracy of semantic segmentation, attention mechanism is introduced to realize the complementary fusion of RGB and depth modal features. The proposed RGB-D dual modal information complementary semantic segmentation network is designed based on encoder-decoder framework, in which the encoder adopts double branch network structure to extract the feature map of RGB image and depth image respectively, and the decoder adopts the structure of layer-by-layer skip connection to gradually integrate semantic information with different granularity to realize pixel-level semantic classification. For the features leaned in the lower layer, the encoder utilizes a RGB-D information complementary module to mutually fuse the feature from one modal to the other modal. The RGB-D information complementary module includes two kinds of attentions, Depth-guided Attention Module (Depth-AM) and RGB-guided Attention Module (RGB-AM). The Depth-AM takes the original depth information as the supplement of RGB features to solve the problem of inaccurate RGB features caused by illumination changes, and the RGB-AM takes the RGB feature as the supplementary information of depth feature to solve the problem of inaccurate depth feature caused by the lack of object texture information. Under the condition of utilizing backbone with same structure, compared with RDF-Net, the proposed RGB-D dual modal information complementary semantic segmentation network has obvious improvements. In details, the mIoU, pixel accuracy and mean pixel are improved by 1.8%, 0.5% and 0.7% on SUNRGB-D dataset, the mIoU, pixel accuracy and mean pixel are improved by 1.8%, 1.3% and 1.9% on NYUv2 dataset.