Abstract:
Existing multi-label image recognition methods mainly focus on deep features, ignoring multi-level feature interactions. Due to the large variability of object categories, layouts, scales, etc., existing region proposal methods for single-label images struggle to adequately mine local regions with high semantic diversity in multi-label images. Therefore, this paper proposes a MLIR method based on semantic perception enhanced region pyramid, which effectively fusing different levels of image features to generate region of interests adapted to multi-scale objects. Two complementary stages, global and local, are introduced. The global stage encodes multi-level features via a dual pathway feature fusion pyramid, efficiently combining high-level semantic information with low-level detail. The local stage employs a semantic-aware region proposal module as well as a region refinement module to achieve the ROIs with high semantic diversity and discriminative properties. Through the supervised learning of multiple local regions in conjunction with the global image, the method achieved a significant improvement on benchmarks such as MS-COCO and VOC 2007, with the mAP improving by 4.3 and 4.2 percentage points, respectively. The method demonstrates a clear advantage over others in the context of multi-label learning with missing labels.