Image Emotion Recognition via Fusion Multi-Level Representations
-
Graphical Abstract
-
Abstract
Visualizations prove that the hierarchical structure mainly relies on deep semantic information whereas ignored shallow visual details are crucial for emotional evocation. A multi-level hybrid representation model is proposed to address the problem that most current methods in image sentiment analysis focus on low-level visual features or high-level semantics but fail to comprehensively consider features on different levels. Therefore, we propose a shallow visual feature extractor embedded into the backbone to obtain shallow visual information. The deep semantics of the backbone and the shallow visual detail representations of the branches are aggregated into hybrid representations through a fusion layer for emotion recognition. In addition, a loss function is introduced to optimize the model in the case of imbalanced samples in image emotion datasets. Experiments on 6 image emotion datasets (including ordinary and art images) such as FI and Emotion6, show that the emotion recognition accuracy has been improved by more than 1.5%.
-
-