基于多模态自适应卷积的RGB-D图像语义分割

孙启超; 恩擎; 段立娟; 乔元华

doi:10.3724/SP.J.1089.2022.19132

基于多模态自适应卷积的RGB-D图像语义分割

RGB-D Image Semantic Segmentation Based on Multi-Modal Adaptive Convolution

摘要

摘要: 随着深度传感器的出现,很多研究开始利用颜色和深度信息解决语义分割问题.现有方法未能充分利用颜色特征和深度特征的互补信息,并且通常利用固定权重的卷积核提取多尺度特征,易造成参数量冗余且无法进行在线自适应.为了解决上述问题,提出了一种基于多模态自适应卷积的RGB-D图像语义分割方法,通过引入轻量级的多模态自适应卷积生成模块,动态地生成多尺度自适应卷积核,将多模态特征的上下文互补信息嵌入卷积滤波器中,在卷积过程中充分利用了图像的内在信息,实现高效融合多模态颜色特征和深度特征.相比于传统的卷积方法和多尺度特征提取方法,文中方法有着更高的计算效率和更好的分割效果.在公开数据集SUNRGB-D和NYUDepthv2上的结果表明,文中方法的像素精准度、平均像素精度和交并比分别达到了82.5%,62.0%,50.6%和77.1%,64.2%,50.8%,均优于对比的RGB-D语义分割方法.

Abstract: With the availability of consumer RGB-D sensors,lots of research uses both color information and depth information for semantic segmentation.However,most previous studies simply fuse RGB features and depth features with equal-weight concatenating or summing,which may fail to effectively use complementary information between RGB information and depth information.On the other hand,previous works construct multi-scale representation by utilizing multi-scale convolution kernel with the fixed-parameter,which may lead to parameter redundancy and fail to perform online self-adaption.To effectively utilize the internal context information of multi-modal features,an RGB-D image semantic segmentation network is proposed by introducing a multi-modal adaptive convolution module.The multi-scale adaptive convolution kernel is generated dynamically,and the context information of multi-modal features is embedded into the multi-scale convolution filters effectively.Compared with the traditional multi-scale convolution kernel,proposed method has higher computational efficiency and better accuracy.Experimental results on the public RGB-D indoor semantic segmentation datasets SUN RGB-D and NYU Depth v2 show that the pixel accuracy,mean pixel accuracy,and mean IoU of proposed method is 82.5%,62.0%,50.6%and 77.1%,64.2%,50.8%Respectively,which outperforms all existing RGB-D semantic segmentation methods.

HTML全文

参考文献(0)

施引文献

资源附件(0)