高级检索
蒋海浪, 刘建明. 采用多部件学习的细粒度图像识别[J]. 计算机辅助设计与图形学学报, 2023, 35(7): 1032-1039. DOI: 10.3724/SP.J.1089.2023.19537
引用本文: 蒋海浪, 刘建明. 采用多部件学习的细粒度图像识别[J]. 计算机辅助设计与图形学学报, 2023, 35(7): 1032-1039. DOI: 10.3724/SP.J.1089.2023.19537
Jiang Hailang, Liu Jianming. Fine-Grained Image Recognition via Multi-Part Learning[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(7): 1032-1039. DOI: 10.3724/SP.J.1089.2023.19537
Citation: Jiang Hailang, Liu Jianming. Fine-Grained Image Recognition via Multi-Part Learning[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(7): 1032-1039. DOI: 10.3724/SP.J.1089.2023.19537

采用多部件学习的细粒度图像识别

Fine-Grained Image Recognition via Multi-Part Learning

  • 摘要: 现有基于交叉熵损失函数的细粒度识别方法倾向于发现对象最具有判别性的部件,忽略其他同样关键的具有判别性的次要部件.为了发现尽可能多的、具有辨别性的局部部件,提出采用多部件学习的细粒度图像识别方法.首先提出一个无参数的基于语义块混合的图像数据增强模块,通过交换图像对中最具有判别性的部件,在增广训练数据的同时避免引入无关背景噪声,提高网络对输入扰动灵敏度的鲁棒性和泛化能力;然后提出多部件对抗擦除模块,在注意力和伯努利分布引导下擦除特征图上最具判别性区域,迫使网络学习发现特征图上其他辨别性区域,注意力引导保证擦除区域具有足够的判别性,伯努利分布引导保证擦除区域的多样性;最后通过融合中层特征,进一步提升网络性能.所提方法具有模型无关特性,可以作为一种即插即用模块,与现有多种主干网络相结合.以ResNet-50为主干网络,在3个公开数据集CUB-200-2011,FGVC-Aircraft和Stanford Cars上的实验结果表明,所提方法的分类精度分别达到89.2%,95.5%和94.0%;该方法能够发现更多辨别性部件,且准确率优于相同主干网络下的对比方法.

     

    Abstract: The existing methods mainly use attention to locate the subtle parts. However, convolutional neural networks (CNNs), which employ the cross-entropy loss as the loss function, can only learn the most discriminative part and ignore other meaningful regions. In this paper, a novel fine-grained image recognition method via multipart learning (MPL) is presented. Firstly, a parameter-free data augmentation method named Semantic Patch Mix is proposed, which improves the networks' generalization performance on the test distribution and robustness to the sensitivity to input perturbations by exchanging the most discriminative part of the image. Secondly, a parameter-free multi-part adversarial erasing module is proposed, which erases the most discriminative region under the guidance of attention and Bernoulli distribution to force the network to discover other discriminative regions of the object. The attention guidance ensures that the erased regions are sufficiently discriminative, and the Bernoulli distribution guidance ensures that the erased regions are diverse. Finally, mid-level features are incorporated to further improve performance. The proposed method is model-agnostic and thus can serve as a plug-and-play module to be applied to various backbone networks. Taking ResNet-50 as the backbone network, the classification accuracy of the proposed method on three public data sets CUB-200-2011, FGVC-Aircraft and Stanford Cars reached 89.2%, 95.5% and 94.0% respectively. Experimental results show that the proposed method, which can discover more discriminative parts, outperforms state-of-the-art approaches.

     

/

返回文章
返回