藏汉双语场景图像数据集合成及文本检测方法

郝玉胜; 王维兰; 李金成; 林强

doi:10.3724/SP.J.1089.2022.18954

藏汉双语场景图像数据集合成及文本检测方法

A Method for Bilingual Tibetan-Chinese Scene Image Dataset Synthesis and Text Detection

摘要

摘要: 为满足大量藏汉双语场景图像中的文字检测和识别的需求,合成数据集并训练深度学习模型,提出场景图像藏汉双语文本检测方法.首先,针对缺乏藏汉双语场景图像数据集的问题,提出基于轮廓检测和泊松图像编辑的合成方法,采用人工标注和自动化合成方式生成了具有相当规模的藏汉双语场景图像数据集BiTCSD,其中包含合成图像87680幅、人工标注图像5550幅;其次,验证了使用合成数据集训练模型的有效性;最后,在不同数据集上训练了深度文本框连接网络CTPN,并在测试集上针对不同语种评价了模型的文本检测性能.实验结果表明:通过合成样本训练CTPN模型,能够使模型的文本检测指标大幅提升;训练后的CTPN能够以较高的准确率和召回率检测场景图像中的藏汉双语文本区域,针对藏语文本的检测准确率P、召回率R和F值分别为0.91,0.85和0.88;针对汉语文本的检测准确率P、召回率R和F值分别为0.89,0.83和0.86.

Abstract: In order to meet the requirement of bilingual Tibetan-Chinese text detection in a large number of scene images in reality,a dataset is synthesized and a deep-learning based model is trained,thus providing a solution to the issue of Tibetan-Chinese text detection in scene images.Firstly,to solve the problem of lacking large number of annotated samples,a synthesis method based on contour detection and Poisson image editing is proposed.A bilingual Tibetan-Chinese scene image dataset(BiTCSD)is synthesized,including 87680 synthetic images and 5550 manually annotated images.Secondly,the effectiveness of training model using synthetic dataset is verified.Thirdly,the connectionist text proposal network(CTPN)based on deep learning is trained on different datasets and the text detection performance of the model is evaluated for different languages on the test set.The experimental results demonstrate that the detection performance of CTPN can be greatly improved by training the model via synthetic samples,the trained CTPN can detect the bilingual Tibetan-Chinese text lines in scene images with high accuracy and recall,with the the precision of 0.91,the recall of 0.85 and the F value of 0.88 for Tibetan text detection,and the detection precision,recall and F value for Chinese text detection is 0.89,0.83 and 0.86,respectively.

HTML全文

参考文献(0)

施引文献

资源附件(0)