高级检索
李志欣, 侯传文, 谢秀敏. 利用多重相似度矩阵增强跨模态哈希检索[J]. 计算机辅助设计与图形学学报, 2022, 34(6): 933-945. DOI: 10.3724/SP.J.1089.2022.19044
引用本文: 李志欣, 侯传文, 谢秀敏. 利用多重相似度矩阵增强跨模态哈希检索[J]. 计算机辅助设计与图形学学报, 2022, 34(6): 933-945. DOI: 10.3724/SP.J.1089.2022.19044
Li Zhixin, Hou Chuanwen, Xie Xiumin. Enhancing Cross-Modal Hash Retrieval with Multiple Similarity Matrices[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(6): 933-945. DOI: 10.3724/SP.J.1089.2022.19044
Citation: Li Zhixin, Hou Chuanwen, Xie Xiumin. Enhancing Cross-Modal Hash Retrieval with Multiple Similarity Matrices[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(6): 933-945. DOI: 10.3724/SP.J.1089.2022.19044

利用多重相似度矩阵增强跨模态哈希检索

Enhancing Cross-Modal Hash Retrieval with Multiple Similarity Matrices

  • 摘要: 为进一步提升跨模态检索的性能,提出融合多级相似度信息的跨模态哈希检索方法.首先,利用自注意力的方法增强文本特征,并基于不同模态的原始特征和哈希特征构造新的融合特征;然后,在这3种特征的基础上,构造出3个辅助相似度矩阵,并采用加权组合的方法构造出第4个辅助相似度矩阵;最后,通过这4个不同的矩阵分别计算不同相似度矩阵之间和不同模态之间的损失函数.这4个不同的矩阵既包括不同的特征形式,也包括不同的矩阵构造方式,因而能更好地表达不同模态的相似度信息,并提升检索性能.在Wikipedia,MIRFlickr和NUS-WIDE 3个基准数据集上的实验结果表明,所提方法在不同码位的mAP值优于许多当前国际先进的方法,具有良好的有效性和鲁棒性.

     

    Abstract: In order to further improve the performance of cross-modal retrieval,a cross-modal hash retrieval method integrating multi-level similarity information is proposed.First,self-attention method is used to enhance the text features,and a new fusion feature is constructed based on the original features and hash features of different modalities.Then,based on these three features,three auxiliary similarity matrices are constructed,and the fourth auxiliary similarity matrix is constructed by a weighted combination method.Finally,these four different matrices are used to calculate the loss functions between different similarity matrices and between different modalities.Since the four matrices include different feature forms and different matrix construction methods,they can better express similarity information of different modalities and improve the retrieval performance.The experiments are conducted on three benchmark datasets of Wikipedia,MIRFlickr and NUS-WIDE.The results show that the mAP values at different code bits of proposed method is better than that of many state-of-the-art methods,which verifies the effectiveness and robustness of our method.

     

/

返回文章
返回