高级检索
杨超, 官铮, 王学, 马文笔. 显著性增强和全局感知的RGB-T显著性目标检测[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2023-00678
引用本文: 杨超, 官铮, 王学, 马文笔. 显著性增强和全局感知的RGB-T显著性目标检测[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2023-00678
Chao Yang, Zheng Guan, Xue Wang, Wenbi Ma. Saliency Enhancement and Global Awareness for RGB-T Salient Object Detection[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2023-00678
Citation: Chao Yang, Zheng Guan, Xue Wang, Wenbi Ma. Saliency Enhancement and Global Awareness for RGB-T Salient Object Detection[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2023-00678

显著性增强和全局感知的RGB-T显著性目标检测

Saliency Enhancement and Global Awareness for RGB-T Salient Object Detection

  • 摘要: 有效捕捉和利用图像不同特征层之间的特性和互补潜力, 对于精确定位显著目标和保留其细节轮廓具有重要意义. 针对现有的大多数RGB-T显著性目标检测方法将提取到的图像特征直接输入到聚合模块中, 并利用简单的递归结构定位显著目标, 其所带来的低泛化能力问题, 提出了一种显著性增强和全局感知的RGB-T显著性目标检测方法. 首先利用预训练模型提取图像的原始特征; 其次提出降噪模块(NRM), 作为编码和跨模态解码之间的桥梁, 以纯化显著对象的特征表示; 再次提出高级语义指导模块(HSGB), 融合不同模态的高级语义信息, 以保留显著对象的位置信息; 最后设计跨模态交互模块(MMIB)在保留显著对象位置信息的同时, 并指导解码模块进行多级特征的聚合操作, 得到最终显著图. 在VT821、VT1000和VT5000数据集上的实验结果表明, 所提方法的MAE指标分别达到3.0%、1.8%和3.0%, 性能优于大多数主流方法.

     

    Abstract: Effectively capturing and utilizing the features and complementary potential between different feature layers of an image is of great significance for accurately locating salient objects and preserving their detailed contours. Aiming at the problem of generalization ability caused by the fact that most existing RGB-T saliency detection models input the extracted image features directly into the aggregation modules, and use a simple recursive structure to locate the saliency object, a new method saliency enhancement and global awareness for RGB-T salient object detection is proposed. Firstly, the pre-trained model is used to extract the original features of the image. Secondly, a noise reduction module (NRM) is proposed, which acts as a bridge between encoding blocks and cross-modal decoding blocks to purify the features of salient objects. The high-level semantic guidance module (HSGB) is proposed to combine the high-level semantic information of different modes to preserve the location information of salient objects. Finally, a multi-modal interaction module (MMIB) is designed to retain the location information of salient objects and guide the aggregation of multi-level features to obtain the final salient map. Experimental results on VT821, VT1000 and VT5000 datasets show that the MAE index of the proposed method reaches 3.0%, 1.8% and 3.0%, respectively, which is superior to most existing methods.

     

/

返回文章
返回