基于注意机制和多尺度跨模态融合的RGB-D显著性检测

崔志强; 冯正勇; 王峰; 刘强

doi:10.3724/SP.J.1089.2023.19479

基于注意机制和多尺度跨模态融合的RGB-D显著性检测

RGB-D Saliency Detection Based on Attention Mechanism and Multi-Scale Cross-Modal Fusion

摘要

摘要: 针对基于深度卷积神经网络的 RGB-D 显著性检测性能差等问题, 提出利用注意机制和多尺度跨模态融合进行 RGB-D 显著性检测的方法. 首先采用多尺度残差注意模块对骨干网络提取的特征进行预处理; 然后提出多尺度跨模态融合策略, 对高层 RGB 特征和深度特征进行融合, 获得初始显著图; 最后采用边界细化模块细化初始显著图中目标的边界, 使最终显著图包含敏锐的边界和完整的突出目标. 在 5 个基准数据集上与 10 种先进方法进行实验的结果表明, 所提方法在 4 个评价指标上均处于前 3 名; 尤其是在 NJUD 和 SIP 数据集上, 该方法在 4 个指标上提升了0.5%~1.5%.

Abstract: Aiming at the poor performance of RGB-D saliency detection based on deep convolution neural network, a method of RGB-D saliency detection using attention mechanism and multi-scale cross-modal fusion is proposed. Firstly, the multi-scale residual attention module is used to preprocess the features extracted from the backbone network. Secondly, a multi-scale cross-modal fusion strategy is proposed to fuse the high-level RGB and depth features to obtain the initial saliency map. Finally, the boundary refinement module is utilized to refine the boundary of the object in the initial saliency map, so that the final saliency map contains sharp boundaries and complete salient objects. The results of experiments with ten advanced methods on five benchmark datasets show that the proposed method ranks in the top three in four evaluation metrics, especially on the NJUD and SIP datasets, where the method improves by 0.5%-1.5% in four metrics.

HTML全文

参考文献(30)

施引文献

资源附件(0)