高级检索
陈正, 赵晓丽, 张佳颖, 尹明臣, 叶翰辰, 周浩军. 基于跨模态特征融合的RGB-D显著性目标检测[J]. 计算机辅助设计与图形学学报, 2021, 33(11): 1688-1697. DOI: 10.3724/SP.J.1089.2021.18710
引用本文: 陈正, 赵晓丽, 张佳颖, 尹明臣, 叶翰辰, 周浩军. 基于跨模态特征融合的RGB-D显著性目标检测[J]. 计算机辅助设计与图形学学报, 2021, 33(11): 1688-1697. DOI: 10.3724/SP.J.1089.2021.18710
Chen Zheng, Zhao Xiaoli, Zhang Jiaying, Yin Mingchen, Ye Hanchen, Zhou Haojun. RGB-D Image Saliency Detection Based on Cross-Model Feature Fusion[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(11): 1688-1697. DOI: 10.3724/SP.J.1089.2021.18710
Citation: Chen Zheng, Zhao Xiaoli, Zhang Jiaying, Yin Mingchen, Ye Hanchen, Zhou Haojun. RGB-D Image Saliency Detection Based on Cross-Model Feature Fusion[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(11): 1688-1697. DOI: 10.3724/SP.J.1089.2021.18710

基于跨模态特征融合的RGB-D显著性目标检测

RGB-D Image Saliency Detection Based on Cross-Model Feature Fusion

  • 摘要: 为了解决基于彩色图像的显著性检测在多目标或小目标等场景下无法准确检测出显著目标的问题,提出了一种基于RGB-D跨模态特征融合的显著性检测网络模型,该网络模型以改进的全卷积神经网络(FCN)为双流主干网络,分别提取彩色与深度特征并作出预测,最后利用Inception结构融合生成最终显著图.针对原FCN实际感受野远低于理论感受野,没有真正利用图像全局信息的问题,设计了双分支结构的全局与局部特征提取块,利用全局特征分支提取全局信息并指导局部特征提取,并以此构建了改进的FCN.此外,考虑到不同层级上彩色与深度特征之间的差异性,提出了跨模态特征融合模块,采用点积有选择性地融合彩色和深度特征,与加法和级联相比,采用点乘可以有效减少噪声与冗余信息.通过在3个公开基准数据集上与21种主流网络相比的综合实验表明,所提模型在S值、F值和MAE这3个指标上基本处于前3水平,同时对模型大小进行了比较,其大小仅为MMCI的4.7%,与现有最小模型A2dele相比减少了22.8%.

     

    Abstract: In order to solve the problem that saliency detection based on RGB images cannot accurately detect sa-lient object in scenes such as multiple targets or small targets,a novel saliency detection network based on RGB-D cross-modal feature fusion is proposed.The network takes the improved fully convolutional network(FCN)as the dual-stream backbone network,extracts the color and depth features and makes predictions respec-tively,and finally uses the inception structure fusion to generate the final saliency map.Aiming at the problem that the actual receptive field of the original FCN is far lower than the theoretical receptive field,and the global image information is not really used,a dual-branch structure global and local feature extraction block is designed.The global feature branch is used to extract global information and guide local feature extrac-tion.This builds an improved FCN.In addition,considering the differences between color and depth features at different levels,a cross-modal feature fusion module is proposed,which uses dot product to selectively fuse color and depth features.Compared with addition and cascade,it uses dot multiplication.It can effec-tively reduce noise and redundant information.Comprehensive experiments on three benchmark datasets demonstrate that compared with 21 mainstream networks,this model is basically in the top three levels in terms of S value,F value and MAE.At the same time,the size of the model is analyzed.In comparison,its size is only 4.7%of MMCI,which has a decrease of 22.8%compared with the existing smallest model A2dele.

     

/

返回文章
返回