基于对抗投影学习的跨模态哈希检索
Adversarial Projection Learning Based Hashing for Cross-Modal Retrieval
-
摘要: 跨模态哈希检索以其较高的检索效率和较低的存储成本,在跨模态检索领域受到了广泛的关注.现有的跨模态哈希大多直接从多模态数据中学习哈希码,不能充分利用数据的语义信息,因此无法保证数据低维特征在模态间的分布一致性,解决这个问题的关键之一是要准确地度量多模态数据之间的相似度.为此,提出一种基于对抗投影学习的哈希(adversarial projection learning based Hashing for cross-modal retrieval,APLH)方法用于跨模态检索.利用对抗训练学习来自不同模态的低维特征,并保证低维特征在模态间的分布一致性.在此基础上,利用跨模态投影匹配约束(cross-modal projection matching,CMPM),最小化特征投影匹配分布和标签投影匹配分布之间的KL(Kullback-Leibler)散度,利用标签信息使数据低维特征之间的相似度结构与语义空间中的相似度结构趋于一致.此外,在哈希码学习阶段,引入加权余弦三元组损失进一步利用数据的语义信息;且为减小哈希码的量化损失,使用离散优化的方法优化哈希函数.在3个跨模态数据集MIRFlickr25K,NUS-WIDE,Wikipedia上,以不同码位计算mAP,且所提方法的mAP值均优于其他算法,验证了其在跨模态哈希检索上的优越性、鲁棒性以及CMPM的有效性.Abstract: Cross-modal Hashing has received a lot of attentions in the field of cross-modal retrieval due to its high retrieval efficiency and low storage cost.Most of the existing cross-modal Hashing methods learn Hash codes directly from multimodal data and cannot fully utilize the semantic information of the data,so the dis-tribution consistency of low-dimensional features across modalities cannot be guaranteed.To this end,ad-versarial projection learning based Hashing for cross-modal retrieval(APLH)is proposed,which uses ad-versarial training to learn low-dimensional features from different modalities and to ensure the distribution consistency of low-dimensional features across modalities.On this basis,cross-modal projection matching constrain(CMPM)is introduced which minimizes the Kullback-Leibler divergence between feature projec-tion matching distributions and label projection matching distributions,and label information is used to align similarities between low-dimensional features of data with similarities in semantic space.Furthermore,in the Hashing learning phase,a weightedcosine triplet loss is introduced to further exploit the semantic informa-tion of the data,and to reduce the quantization loss,the Hashing function using a discrete optimization ap-proach is optimized.The mean average precision of the proposed method on three databases MIRFlickr25K,NUS-WIDE and Wikipedia is better than other methods of comparison,which verifies the effectiveness of CMPM and shows the robustness of our method.