高级检索
赵鹏, 马泰宇, 李毅, 刘慧婷. 融合全模态自编码器和生成对抗机制的跨模态检索[J]. 计算机辅助设计与图形学学报, 2021, 33(10): 1486-1494. DOI: 10.3724/SP.J.1089.2021.18757
引用本文: 赵鹏, 马泰宇, 李毅, 刘慧婷. 融合全模态自编码器和生成对抗机制的跨模态检索[J]. 计算机辅助设计与图形学学报, 2021, 33(10): 1486-1494. DOI: 10.3724/SP.J.1089.2021.18757
Zhao Peng, Ma Taiyu, Li Yi, Liu Huiting. Cross-Modal Retrieval Based on Full-Modal Autoencoder with Generative Adversarial Mechanism[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(10): 1486-1494. DOI: 10.3724/SP.J.1089.2021.18757
Citation: Zhao Peng, Ma Taiyu, Li Yi, Liu Huiting. Cross-Modal Retrieval Based on Full-Modal Autoencoder with Generative Adversarial Mechanism[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(10): 1486-1494. DOI: 10.3724/SP.J.1089.2021.18757

融合全模态自编码器和生成对抗机制的跨模态检索

Cross-Modal Retrieval Based on Full-Modal Autoencoder with Generative Adversarial Mechanism

  • 摘要: 针对现有基于生成对抗网络的跨模态检索方法不能充分挖掘模态间不变性的问题,提出一种融合全模态自编码器和生成对抗机制的跨模态检索方法.引入2个并行的全模态自编码器,将不同模态的样本嵌入公共空间,每个全模态自编码器不仅重构出自身模态的特征表示,而且还重构出跨模态的特征表示.设计了一个分类器,预测公共空间中嵌入特征的类别,学习并保留样本中的语义判别性.设计了3个判别器,分别判断输入其中的特征所属的模态类别,它们协同工作,充分挖掘模态间的不变性.以平均精度均值为指标评价跨模态检索的精确度,在Pascal Sen-tence,Wikipedia和NUS-WIDE-10k这3个公开数据集上进行实验,实验结果表明,与10个包括传统方法和深度学习方法在内的跨模态检索的主流方法进行对比,所提方法在3个数据集上的平均精度均值分别至少提高了4.8%,1.4%和1.1%,证明了所提方法的有效性.

     

    Abstract: Existing cross-modal retrieval methods based on generative adversarial networks can’t fully ex-plore the inter-modality invariance.Aiming to solve the problem,a novel cross-modal retrieval method based on full-modal autoencoder with generative adversarial mechanism is proposed.Two parallel full-mo-dal autoencoders are introduced to embed samples of different modalities into a common space.Each full-modal autoencoder not only reconstructs the feature representation of its own modality,but also recon-structs the feature representation of the other modality.A classifier is designed to predict the categories of the embedding features in the common space,which aims to preserve the semantic discriminative informa-tion of samples.Three discriminators are designed to determine the modal categories of the input features,respectively,and these three discriminators work cooperatively to fully explore the inter-modality invari-ance.The mean average precision(mAP)is used to evaluate the accuracy of cross-modal retrieval and ex-tensive experiments are conducted on three public datasets which are Pascal Sentence,Wikipedia and NUS-WIDE-10k.Compared to ten state-of-the-art cross-modal retrieval methods including traditional methods and deep learning methods,the mAP of the proposed method on the three datasets improves at least 4.8%,1.4%and 1.1%on the three datasets respectively.The experimental results prove the effectiveness of the proposed method.

     

/

返回文章
返回