Fine-Grained Image Recognition via Multi-Part Learning

Jiang Hailang; Liu Jianming

doi:10.3724/SP.J.1089.2023.19537

Jiang Hailang, Liu Jianming. Fine-Grained Image Recognition via Multi-Part Learning[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(7): 1032-1039. DOI: 10.3724/SP.J.1089.2023.19537

Citation:

Jiang Hailang, Liu Jianming. Fine-Grained Image Recognition via Multi-Part Learning[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(7): 1032-1039. DOI: 10.3724/SP.J.1089.2023.19537

Citation:

Jiang Hailang, Liu Jianming. Fine-Grained Image Recognition via Multi-Part Learning[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(7): 1032-1039. DOI: 10.3724/SP.J.1089.2023.19537

Fine-Grained Image Recognition via Multi-Part Learning

Graphical Abstract

Graphical Abstract

Abstract

Abstract

The existing methods mainly use attention to locate the subtle parts. However, convolutional neural networks (CNNs), which employ the cross-entropy loss as the loss function, can only learn the most discriminative part and ignore other meaningful regions. In this paper, a novel fine-grained image recognition method via multipart learning (MPL) is presented. Firstly, a parameter-free data augmentation method named Semantic Patch Mix is proposed, which improves the networks' generalization performance on the test distribution and robustness to the sensitivity to input perturbations by exchanging the most discriminative part of the image. Secondly, a parameter-free multi-part adversarial erasing module is proposed, which erases the most discriminative region under the guidance of attention and Bernoulli distribution to force the network to discover other discriminative regions of the object. The attention guidance ensures that the erased regions are sufficiently discriminative, and the Bernoulli distribution guidance ensures that the erased regions are diverse. Finally, mid-level features are incorporated to further improve performance. The proposed method is model-agnostic and thus can serve as a plug-and-play module to be applied to various backbone networks. Taking ResNet-50 as the backbone network, the classification accuracy of the proposed method on three public data sets CUB-200-2011, FGVC-Aircraft and Stanford Cars reached 89.2%, 95.5% and 94.0% respectively. Experimental results show that the proposed method, which can discover more discriminative parts, outperforms state-of-the-art approaches.

FullText(HTML)

References (33)

Cited By

Turn off MathJax

Article Contents

Fine-Grained Image Recognition via Multi-Part Learning

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content