Abstract:
Aiming at the poor performance of RGB-D saliency detection based on deep convolution neural network, a method of RGB-D saliency detection using attention mechanism and multi-scale cross-modal fusion is proposed. Firstly, the multi-scale residual attention module is used to preprocess the features extracted from the backbone network. Secondly, a multi-scale cross-modal fusion strategy is proposed to fuse the high-level RGB and depth features to obtain the initial saliency map. Finally, the boundary refinement module is utilized to refine the boundary of the object in the initial saliency map, so that the final saliency map contains sharp boundaries and complete salient objects. The results of experiments with ten advanced methods on five benchmark datasets show that the proposed method ranks in the top three in four evaluation metrics, especially on the NJUD and SIP datasets, where the method improves by 0.5%-1.5% in four metrics.