面向多标签图像识别的语义感知增强区域金字塔模型

胡云青; 陈强龙; 张寅

doi:10.3724/SP.J.1089.2023-00763

面向多标签图像识别的语义感知增强区域金字塔模型

Semantic Perception Enhanced Region Pyramid for Multi-Label Image Recognition

摘要

摘要: 现有多标签图像识别方法主要利用深层特征, 忽略了多层次特征交互; 由于物体类别、布局、尺度等差异性较大, 现有针对单标签图像识别的区域建议方法无法充分挖掘多标签图像中语义多样性强的局部区域. 为此, 提出一种基于语义感知增强区域金字塔模型的多标签图像识别方法, 通过有效地融合不同层次的图像特征, 生成适应于多尺度目标的感兴趣区域. 该方法引入2个互补的阶段: 在全局阶段, 通过双路径特征融合金字塔对多层次特征进行编码, 有效地融合深层语义信息与浅层细节信息; 在局部阶段, 利用语义感知区域建议模块和区域调整模块获得具有高度语义多样性和判别性的感兴趣区域. 通过联合多个局部区域与全局图像的监督学习, 所提方法在MS-COCO和VOC 2007等基准数据集的评测中取得了显著的效果提升, mAP指标分别提升4.3和4.2个百分点; 在缺失标签场景下的多标签学习中, 该方法也以明显的优势超越了同类方法.

Abstract: Existing multi-label image recognition methods mainly focus on deep features, ignoring multi-level feature interactions. Due to the large variability of object categories, layouts, scales, etc., existing region proposal methods for single-label images struggle to adequately mine local regions with high semantic diversity in multi-label images. Therefore, this paper proposes a MLIR method based on semantic perception enhanced region pyramid, which effectively fusing different levels of image features to generate region of interests adapted to multi-scale objects. Two complementary stages, global and local, are introduced. The global stage encodes multi-level features via a dual pathway feature fusion pyramid, efficiently combining high-level semantic information with low-level detail. The local stage employs a semantic-aware region proposal module as well as a region refinement module to achieve the ROIs with high semantic diversity and discriminative properties. Through the supervised learning of multiple local regions in conjunction with the global image, the method achieved a significant improvement on benchmarks such as MS-COCO and VOC 2007, with the mAP improving by 4.3 and 4.2 percentage points, respectively. The method demonstrates a clear advantage over others in the context of multi-label learning with missing labels.

HTML全文

参考文献(0)

施引文献

资源附件(0)