Abstract:
In fine-grained image classification, to cope with the difficulty in capturing the subtle discriminative features and locating the region of interest, a multi-scale salient feature bilinear attention classification method was proposed. Firstly, the region patch feature boosting module (RPFBM) was designed. The expression ability of the feature maps was enhanced by region slicing operation which could enlarge and capture fine distinguishable features; Then, multi-branch bilinear attention pooling strategy (MBAP) is proposed, which hierarchically representsed the features of the salient parts of the image with weak supervision mode and improved the attention ability of local information of different scales; Finally, the counterfactual learning was employed to quantify the attention quality. The difference between real learned attention and irrelevant attention on the final prediction results were taken as a measurement index, and therefore the bilinear attention pooling strategy was forced to learn more effective features by maximizing the difference. The accuracy rates of the method proposed in the paper respectively reach 89.3%, 95.0% and 87.6% in three public datasets, CUB-200-2011, Stanford Cars and Stanford Dogs, respectively, which indicates a significant improvement in performance compared with other advanced methods.