利用多模态注意力机制生成网络的图像修复

王山豹; 梁栋; 沈玲

doi:10.3724/SP.J.1089.2023.19578

利用多模态注意力机制生成网络的图像修复

Image Inpainting with Multi-Modal Attention Mechanism Generative Networks

摘要

摘要: 图像修复在老照片复原、目标移除、视频编辑等领域有重要的应用价值，但现有的利用单模态注意力机制的图像修复方法和修复结果存在纹理模糊、语义缺乏等问题.为此，提出一种利用多模态注意力机制生成网络的图像修复方法.首先采用U-Net作为网络的基本框架，实现破损图像的编码、解码以及跳跃连接操作；然后在编码和解码阶段分别构建利用多模态注意力机制的特征提取模块和图像修复模块，通过多尺度的特征融合得到更细粒度的图像修复结果.实验结果表明，在Paris StreetView，CelebA和Places2数据集上，针对3种不同的图像破损率，基于SSIM，PSNR和L₁的3种定量评价指标，利用所提方法在27个对比项中取得了20个优于、1个相同、6个略低于其他4种对比方法的结果，验证了所提方法的有效性.

Abstract: Image inpainting has important application value in the practical fields of old photo restoration, target removal and video editing. However, the results of existing single-modal attention-based methods show the problems of blurry texture and lack of semantics. We proposed an image inpainting method based on the multi-modal attention mechanism generative networks. Firstly, we adopted a U-Net as the backbone to finish the encoding, decoding and jump connection of damaged images. Secondly, in the encoding and decoding stages, the feature extraction block and image inpainting block based on multi-modal attention mechanism are constructed respectively, which can achieve more fine-grained content completion through multi-scale feature fusion. Finally, combining three image damage rates and three evaluation metrics(SSIM, PSNR and L₁), the experiments on Paris StreetView, CelebA and Places2 dataset show that, compared with the other 4 comparison methods, the proposed method achieves 20 higher, 1 same, and 6 slightly lower results in a total of 27 comparison items, which verifies the effectiveness of the proposed method.

HTML全文

参考文献(27)

施引文献

资源附件(0)