Deep Multimodal Clustering Based on Self-supervised Information entropy learning
-
-
Abstract
In order to maintain consistency in the clustering space among multiple modalities and eliminate irrelevant information within each modality, a deep multimodal clustering algorithm based on self-supervised information entropy learning is proposed. Firstly, a multimodal convolutional autoencoder is employed in conjunction with a reconstruction task to acquire low-dimensional latent features. Subsequently, deep embedding techniques are utilized to learn an ideal common clustering space for multiple modalities. This common space is used as labels to supervise and guide each modality's clustering subspaces to progressively approach the ideal state, ensuring that the latent features of each modality have similar distributions. Finally, combining the theory of information entropy, the mutual information between labels and latent features of each modality is constrained to guarantee inter-modality correlation while simultaneously reducing redundancy within modality data. Additionally, experiments are conducted on benchmark datasets including Fashion-MNIST, COIL-20, FRGC, YTF, RGB-D, and Noisy-MNIST. Experimental results demonstrate that the proposed algorithm outperforms other comparison algorithms in terms of ACC and NMI clustering metrics. Particularly on the Fashion-MNIST dataset, ACC is improved by 2.2 percentage points compared to the advanced StSNE algorithm.
-
-