高级检索

DiffTIN:基于扩散模型的结构语义融合引导文本图像修复

DiffTIN:Diffusion Model-Based Structural-Semantic Fusion Guided Text Image Inpainting

  • 摘要: 文本图像修复旨在重建缺损文本结构,确保检测与识别准确性。针对传统深度卷积或生成对抗网络方法在修复严重缺损的文本图像时易出现语义失真、笔画断裂等情况,而在工业场景中,缺损区域不可预知且文本结构严格,现有方法难以实现像素级保真与语义一致性的问题,提出结构语义融合引导扩散模型DiffTIN对文本图像进行修复。首先通过结构重建模块中的图像分割网络预测全局文本掩模;然后结合场景文本识别器生成语义先验,实现扩散过程的语义引导;最后将文本结构语义融合先验与扩散模型隐空间表征相耦合,采用渐进式修复策略逐层恢复笔画细节。在公开数据集TII-ST上的实验结果表明,DiffTIN使单词平均识别准确率提升1.15个百分点,峰值信噪比提升1.03 dB,优于GSDM等基准方法,有效地提升了文本识别的鲁棒性。

     

    Abstract: Text image inpainting aims to reconstruct defective textual structures to ensure accurate detection and recognition. Traditional methods based on deep convolutional networks or generative adversarial networks (GANs) are frequently challenged by semantic distortion and stroke fragmentation under severe damage conditions. In industrial scenarios, where defect regions are unpredictable and textual structures impose strict constraints, existing methods struggle to achieve pixel-level fidelity while maintaining semantic con-sistency. To address these limitations, a structure-semantics fusion-guided diffusion model (DiffTIN) is proposed. Specifically, a dual-stream guidance mechanism is innovatively designed: 1) Global text masks are predicted through an image segmentation network in the structure reconstruction module; 2) Semantic priors generated by scene text recognizers are integrated to guide the diffusion process. The fused priors of textual structure-semantics are coupled with latent space representations of the diffusion model, and a pro-gressive inpainting strategy is employed to hierarchically restore stroke details. Experimental results on the TII-ST dataset demonstrate that the proposed method improves word recognition accuracy by 1.15 per-centage points and enhances the peak signal-to-noise ratio (PSNR) by 1.03 dB, outperforming baseline methods such as GSDM and significantly improving the robustness of text recognition.

     

/

返回文章
返回