高级检索

基于信息丰富度的切碎中文文档自动拼接复原

Automatic Reconstruction of Cross-Cut Chinese Documents Using Information Quantity

  • 摘要: 针对切碎中文文档的自动拼接复原中无法利用碎纸片形状特征的问题,提出一种基于内容信息丰富度的拼接算法.首先分析了基于汉字内容的碎纸片特征表达方式;在此基础上,提出从横纵2个方面进行碎纸片特征匹配度估计的方法;最后采用信息丰富度确定拼接次序,逐一高效地完成碎纸片的拼接.基于不同碎纸片数量的匹配实验结果表明,相对于传统方法,横纵特征匹配度估计方法分别提高了约4.73%,3.76%的准确度;自动拼接复原实验结果表明,相对于传统算法,基于信息丰富度拼接算法的错误率下降约18%,并大大降低了时间复杂度.

     

    Abstract: Considering the lack of shape character in reconstruction of cross-cut Chinese documents,an information quantity based automatic reconstruction algorithm is proposed in this paper.First,we analyze how to describe the feature of shreds based on Chinese characters.Then,a new evaluation method of feature matching is presented,which consists of horizontal and vertical two aspects.Finally,an automatic reconstruction algorithm is designed according to the orders which are decided by information quantity.Experiments on different scales of shreds show that the accuracy of proposed method is improved about 4.73% and 3.76% respectively on horizontal and vertical,compared with traditional methods.For automatic reconstruction of shreds,it indicates that proposed information quantity based automatic reconstruction algorithm decreases the error rate by 18% and the time complexity greatly,compared with traditional algorithms.

     

/

返回文章
返回