RGB-D Image Semantic Segmentation Based on Multi-Modal Adaptive Convolution
-
Graphical Abstract
-
Abstract
With the availability of consumer RGB-D sensors,lots of research uses both color information and depth information for semantic segmentation.However,most previous studies simply fuse RGB features and depth features with equal-weight concatenating or summing,which may fail to effectively use complementary information between RGB information and depth information.On the other hand,previous works construct multi-scale representation by utilizing multi-scale convolution kernel with the fixed-parameter,which may lead to parameter redundancy and fail to perform online self-adaption.To effectively utilize the internal context information of multi-modal features,an RGB-D image semantic segmentation network is proposed by introducing a multi-modal adaptive convolution module.The multi-scale adaptive convolution kernel is generated dynamically,and the context information of multi-modal features is embedded into the multi-scale convolution filters effectively.Compared with the traditional multi-scale convolution kernel,proposed method has higher computational efficiency and better accuracy.Experimental results on the public RGB-D indoor semantic segmentation datasets SUN RGB-D and NYU Depth v2 show that the pixel accuracy,mean pixel accuracy,and mean IoU of proposed method is 82.5%,62.0%,50.6%and 77.1%,64.2%,50.8%Respectively,which outperforms all existing RGB-D semantic segmentation methods.
-
-