高级检索

典型概念驱动的模态缺失深度跨模态检索

Typical Concept-Driven Modality-missing Deep Cross-Modal Retrieval

  • 摘要: 跨模态检索使用一种模态的数据作为查询条件, 在另一种模态中检索语义相关的数据. 绝大多数的跨模态检索方法仅适用于模态完备条件下的跨模态检索场景, 它们对缺失模态数据的处理能力仍有待提升, 为此, 提出一个典型概念驱动的模态缺失深度跨模态检索模型. 首先提出一个融合多模态预训练网络的多模态Transformer模型, 能在模态缺失的情况下充分进行多模态细粒度语义交互, 提取多模态融合语义并构造跨模态子空间, 同时引导学习生成多模态典型概念; 然后使用典型概念作为跨注意力的键和值来驱动模态映射网络的训练, 使模态映射网络可以自适应地感知查询模态数据中隐含的多模态语义概念, 生成跨模态检索特征, 充分保留训练提取的多模态融合语义. 在Wikipedia, Pascal-Sentence, NUS-WIDE和XmediaNet这4个基准跨模态检索数据集上的实验结果表明, 所提模型比现有最优模型的平均准确率均值分别提高了1.7%, 5.1%, 1.6%和5.4%. 该模型的源代码可在https://gitee.com/MrSummer123/CPCMR网站获得.

     

    Abstract: Cross-modal retrieval takes one modality data as a query and retrieves semantically relevant data in another modality. Most existing cross-modal retrieval methods are designed for scenarios with complete modality data. However, in real-world applications, incomplete modality data often exists, which these methods struggle to handle effectively. In this paper, we propose a typical concept-driven modality-missing deep cross-modal retrieval model. Specifically, we first propose a multi-modal Transformer integrated with multi-modal pretraining networks, which can fully capture the multi-modal fine-grained semantic interaction in the incomplete modality data, extract multi-modal fusion semantics and construct cross-modal subspace, and at the same time supervise the learning process to generate typical concepts. In addition, the typical concepts are used as the cross-attention key and value to drive the training of the modal mapping network, so that it can adaptively preserve the implicit multi-modal semantic concepts of the query modality data, generate cross-modal retrieval features, and fully preserve the pre-extracted multi-modal fusion semantics. Experimental results on four benchmark cross-modal retrieval datasets—Wikipedia, Pascal-Sentence, NUS-WIDE, and XmediaNet—show that our proposed method outperforms the existing state-of-the-art models, with average precision improvements of 1.7%, 5.1%, 1.6%, and 5.4%, respectively. The source code of our method is available at: https://gitee.com/MrSummer123/CPCMR.

     

/

返回文章
返回