高级检索
代瑾, 陈莹. 联合线性判别和图正则的任务导向型跨模态检索[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 106-115. DOI: 10.3724/SP.J.1089.2021.18345
引用本文: 代瑾, 陈莹. 联合线性判别和图正则的任务导向型跨模态检索[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 106-115. DOI: 10.3724/SP.J.1089.2021.18345
Dai Jin, Chen Ying. Joint Linear Discrimination and Graph Regularization for Task-Oriented Cross-Modal Retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 106-115. DOI: 10.3724/SP.J.1089.2021.18345
Citation: Dai Jin, Chen Ying. Joint Linear Discrimination and Graph Regularization for Task-Oriented Cross-Modal Retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 106-115. DOI: 10.3724/SP.J.1089.2021.18345

联合线性判别和图正则的任务导向型跨模态检索

Joint Linear Discrimination and Graph Regularization for Task-Oriented Cross-Modal Retrieval

  • 摘要: 针对现有的基于公共子空间的跨模态检索方法对不同检索任务的差异性、检索模态的语义一致性考虑不足的问题,提出一种联合线性判别和图正则的任务导向型跨模态检索方法.该方法在一个联合学习框架中为不同的检索任务构建不同的映射机制,将不同模态的数据映射到公共子空间中以进行相似性度量;学习过程中结合相关性分析和单模态语义回归,保留成对数据间的相关性以及增强查询模态样本的语义准确性,同时利用线性判别分析保证检索模态样本的语义一致性;还为不同模态的数据构建局部近邻图以保留结构信息,从而提升跨模态检索的性能.在Wikipedia和Pascal Sentence这2个跨模态数据集上的实验结果表明,该方法在不同检索任务上的平均mAP值比12种现有方法分别提升了1.0%~16.0%和1.2%~14.0%.

     

    Abstract: Aiming at the problem of insufficient consideration of the differences between different retrieval tasks and semantic consistency of retrieval-modal data in the current common subspace based cross-modal retrieval algorithms,a task-oriented cross-modal retrieval based on jointing linear discrimination and graph regularization is proposed.The approach constructed different mapping mechanisms for retrieval tasks in a joint learning framework,and mapped multi-modal data into common subspaces for similarity measuring.During the learning process,correlation analysis and single-modal semantic regression were combined to preserve the correlation between paired data and enhance the semantic accuracy of query-modal data.Simultaneously,linear discrimination analysis was utilized to ensure semantic consistency of retrieval-modal samples.The approach also constructed local neighbor graphs for multi-modal data to preserve structural information,which can improve the retrieval performance.Experiments results on two cross-modal datasets,namely Wikipedia and Pascal Sentence showed that the average mAP value on different retrieval tasks of the proposed method had respectively increased by 1.0%‒16.0%and 1.2%‒14.0%compared with the twelve existing methods.

     

/

返回文章
返回