采用多部件学习的细粒度图像识别
Fine-grained Image Recognition via Multi-part Learning
-
摘要: 现有基于交叉熵损失函数的细粒度识别方法倾向于发现对象最具有判别性的部件而忽略其他同样关键的具有判别性的次要部件. 为了发现尽可能多的具有辨别行的局部部件, 提出采用多部件学习的细粒度识别方法: 首先提出一个无参数的基于语义块混合的图像数据增强模块, 通过交换图像对中最具有判别性的部件, 在增广训练数据同时避免引入无关背景噪声, 提高网络对输入扰动灵敏度的鲁棒性和泛化能力; 然后提出多部件对抗擦除模块, 在注意力和伯努利分布引导下擦除特征图上最具判别性区域, 迫使网络学习发现特征图上其他辨别性区域, 注意力引导保证擦除区域具有足够的判别性, 伯努利分布引导保证擦除区域的多样性; 最后, 通过融合中层特征进一步提升网络性能. 所提方法具有模型无关特性, 可以作为一种即插即用模块, 与现有多种主干网络相结合. 以ResNet-50为主干网络, 所提方法在三个公开数据集CUB-200-2011, FGVC-Aircraft和Stanford Cars 上分类精度分别达到了89.2%, 95.5%和94.0%. 实验结果表明, 所提方法能够发现更多辨别性部件, 且准确率优于同主干网络下对比算法.Abstract: The existing methods mainly use the attention to locate the subtle parts. However, Convolution neural networks (CNNs), which employ the cross entropy loss as the loss function, can only learn the most discrimina-tive part and ignore other meaningful regions. In this paper, a novel fine-grained image recognition method via multipart learning(MPL) is presented. Firstly, a parameter-free data augmentation method named Semantic patch Mix is proposed, which improves the networks’ generalization performance to the test distribution and ro-bustness to the sensitivity to input perturbations by exchanging the most discriminative part of the image. Sec-ondly, a parameter-free multipart adversarial erasing module is proposed, which erases the most discriminative region under the guidance of attention and Bernoulli distribution to force the network to discover other discrimi-native regions of the object. The attention guidance ensures that the erased regions are sufficiently discriminative, and the Bernoulli distribution guidance ensures that the erased regions are diverse. At last, mid-level features are incorporated to further improve performance. The proposed method is model-agonistic and thus can serve as a plug-and-play module to be applied to various backbone networks. Taking ResNet-50 as the backbone network, the classification accuracy of the proposed method on three public data sets CUB-200-2011, FGVC-Aircraft and Stanford Cars reached 89.2%, 95.5% and 94.0% respectively. Experimental results show that the proposed method, which can discover more discriminative parts, outperforms state-of the-art approaches.