利用多模态注意力机制生成网络的图像修复
Image Inpainting with Multi-modal Attention Mechanism Generative Networks
-
摘要: 图像修复在老照片复原, 目标移除, 视频编辑等领域有着重要的应用价值, 但现有的利用单模态注意力机制的图像修复方法, 修复结果存在纹理模糊, 语义缺乏等问题. 为此, 提出了一种利用多模态注意力机制生成网络的图像修复方法. 首先采用U-Net作为网络的基本框架, 实现破损图像的编码, 解码以及跳跃连接操作; 然后在编码和解码阶段分别构建利用多模态注意力机制的特征提取模块和图像修复模块, 通过多尺度的特征融合得到更细粒度的图像修复结果. 试验表明, 在Paris StreetView, CelebA, 和Places2公有数据集上, 针对3种不同的图像破损率, 基于, 和的3种定量评价指标, 该方法在共27个对比项中取得了20个优于, 1个相同, 6个略低于其他4种对比方法的结果, 验证了该方法的有效性.Abstract: Image inpainting has important application value in the practical fields of old photo restoration, target removal and video editing. However, the results of existing single-modal attention-based methods show the problems of blurry texture and lack of semantics. In this work, we proposed an image inpainting method based on the multi-modal attention mechanism generative networks. Firstly, we adopted a U-Net as the backbone to finish the encoding, decoding and jump connection of damaged images; Then, in the encoding and decoding stages, the feature extraction block and image inpainting block based on multi-modal attention mechanism are constructed respectively, which can achieve more fine-grained content completion through multi-scale feature fusion. Finally, combining three image damage rates and three evaluation metrics(, and), the experiments on Paris Streetview, CelebA and Places2 dataset show that, compared with the other 4 comparison methods, the proposed method achieves 20 higher, 1 same, and 6 slightly lower results in a total of 27 comparison items, which verifies the effectiveness of the proposed method.