监督式多模态图像表征学习

Supervised Multi-modal Dictionary Learning for Image Representation

摘要: 通过结合多模态特征与类别标签信息,提出一种基于监督式多模态词典学习的图像表征方法.首先使用纹理、颜色、形状和结构4种模态的视觉特征,以学习包含"共享+特有"信息的稀疏特征来描述目标的视觉特性;然后通过拉普拉斯正则化项使学习到的稀疏特征能够反映类别标签中的语义信息,以增强所学习特征的辨识力.通过图像分类进行实验的结果表明,该方法优于单模态特征及其他基准多模态特征学习方法.

Abstract: Leveraging multi-modal visual features and category label,a supervised multi-modal dictionary learning method is proposed for image representation.First,a multi-modal dictionary learning algorithm is used to learn a "shared+private" kind of sparse feature from four different visual modalities including texture,color,shape and structure; then,a Laplacian style regularization term is used to let the feature reflect semantic relationships between samples,which could enhance the discrimination power of the feature.Experiments on the image classification task show that the method proposed in this paper outperforms baseline methods.