高级检索
孙启超, 恩擎, 段立娟, 乔元华. 基于多模态自适应卷积的RGB-D图像语义分割[J]. 计算机辅助设计与图形学学报, 2022, 34(8): 1272-1282. DOI: 10.3724/SP.J.1089.2022.19132
引用本文: 孙启超, 恩擎, 段立娟, 乔元华. 基于多模态自适应卷积的RGB-D图像语义分割[J]. 计算机辅助设计与图形学学报, 2022, 34(8): 1272-1282. DOI: 10.3724/SP.J.1089.2022.19132
Sun Qichao, En Qing, Duan Lijuan, Qiao Yuanhua. RGB-D Image Semantic Segmentation Based on Multi-Modal Adaptive Convolution[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(8): 1272-1282. DOI: 10.3724/SP.J.1089.2022.19132
Citation: Sun Qichao, En Qing, Duan Lijuan, Qiao Yuanhua. RGB-D Image Semantic Segmentation Based on Multi-Modal Adaptive Convolution[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(8): 1272-1282. DOI: 10.3724/SP.J.1089.2022.19132

基于多模态自适应卷积的RGB-D图像语义分割

RGB-D Image Semantic Segmentation Based on Multi-Modal Adaptive Convolution

  • 摘要: 随着深度传感器的出现,很多研究开始利用颜色和深度信息解决语义分割问题.现有方法未能充分利用颜色特征和深度特征的互补信息,并且通常利用固定权重的卷积核提取多尺度特征,易造成参数量冗余且无法进行在线自适应.为了解决上述问题,提出了一种基于多模态自适应卷积的RGB-D图像语义分割方法,通过引入轻量级的多模态自适应卷积生成模块,动态地生成多尺度自适应卷积核,将多模态特征的上下文互补信息嵌入卷积滤波器中,在卷积过程中充分利用了图像的内在信息,实现高效融合多模态颜色特征和深度特征.相比于传统的卷积方法和多尺度特征提取方法,文中方法有着更高的计算效率和更好的分割效果.在公开数据集SUNRGB-D和NYUDepthv2上的结果表明,文中方法的像素精准度、平均像素精度和交并比分别达到了82.5%,62.0%,50.6%和77.1%,64.2%,50.8%,均优于对比的RGB-D语义分割方法.

     

    Abstract: With the availability of consumer RGB-D sensors,lots of research uses both color information and depth information for semantic segmentation.However,most previous studies simply fuse RGB features and depth features with equal-weight concatenating or summing,which may fail to effectively use complementary information between RGB information and depth information.On the other hand,previous works construct multi-scale representation by utilizing multi-scale convolution kernel with the fixed-parameter,which may lead to parameter redundancy and fail to perform online self-adaption.To effectively utilize the internal context information of multi-modal features,an RGB-D image semantic segmentation network is proposed by introducing a multi-modal adaptive convolution module.The multi-scale adaptive convolution kernel is generated dynamically,and the context information of multi-modal features is embedded into the multi-scale convolution filters effectively.Compared with the traditional multi-scale convolution kernel,proposed method has higher computational efficiency and better accuracy.Experimental results on the public RGB-D indoor semantic segmentation datasets SUN RGB-D and NYU Depth v2 show that the pixel accuracy,mean pixel accuracy,and mean IoU of proposed method is 82.5%,62.0%,50.6%and 77.1%,64.2%,50.8%Respectively,which outperforms all existing RGB-D semantic segmentation methods.

     

/

返回文章
返回