消除背景噪声增强字符形状特征的场景文字识别
Scene Text Recognition with Eliminating Background Noise and Enhancing Characters Shape Feature
-
摘要: 为了解决现有方法未有效地消除背景噪声和字符自身噪声干扰的问题,提出一种包含3个模块的消除背景噪声增强字符形状特征(EBEC)的文字识别模型.空间注意力机制增强的EBEC网络只关注字符区域特征,以消除背景噪声,迫使网络仅学习字符形状特征,增强字符形状特征;特征提取模块采用EfficientNet-B3作为主干网络提取特征图;基元表征学习模块学习特征图得到视觉文字表征,通过对视觉文字表征解码得到识别结果.实验结果表明,与经典模型相比,所提模型在合成场景数据集上识别准确率提高9.76个百分点,在公开数据集IIIT5K,ICDAR-2003,ICDAR-2015,CUTE80上识别准确率平均提高2.91个百分点;该模型可有效地消除背景噪声和字符自身噪声,提高识别性能.Abstract: A text recognition model that eliminates background noise and enhances the shape features of characters was proposed to solve the problem that the existing methods cannot effectively eliminate the background noise and there is noise interference of the characters themselves. The model consisted of three modules. The EBEC network enhanced by the spatial attention mechanism only paid attention to character region features, eliminated background noise, and forced the network to learn only the character shape features to enhance the character shape features; the feature extraction module extracted feature maps by using EfficientNet-B3 as the backbone network; the primitive representation learning module learned the feature map to obtain the visual text representation and then acquired the recognition result by decoding the visual text representation. The experimental results show that the proposed model improves the recognition accuracy by 9.76 percentage points over the classical model on the synthetic scene dataset and by 2.91 percentage points on average on the public datasets IIIT5K, ICDAR-2003, ICDAR-2015, CUTE80. Therefore, the model can not only effectively eliminate background noise and character noise, but also improve recognition performance.