基于集成多尺度注意力的图像篡改定位
Image Tampering Localization Based on Integrated Multiscale Attention
-
摘要: 近年来, 基于卷积神经网络(CNNs)的图像拼接篡改检测算法取得了相当的进展. 然而, 由于篡改对象的大小和类型不同, 现有的大多数模型仍然不能取得令人满意的效果. 针对这些问题, 提出一种集成多尺度注意力的网络来进行图像篡改定位. 首先, 在编码器中添加多尺度的双注意力模块——位置注意力和通道注意力. 对于位置注意力模块, 通过捕捉任意两张特征图的位置关系来获取特征图在空间维度上的语义依赖, 使每个像素点均能感知其余位置像素点的信息. 对于通道注意力模块, 采用和位置注意力相似的自注意力操作来捕捉任意两个通道映射之间的关系, 使像素点感知到其余通道像素点的信息. 此外, 多尺度注意力模块考虑到篡改目标大小不同, 将特征图划分为了多个子区域, 从而在捕获长程语义信息依赖关系的同时也能适应各种形状大小的篡改区域. 这个设计可以更好地处理不同尺度的拼接篡改图, 而且可以降低高分辨率特征图的计算开销. 实验结果表明, 集成多尺度注意力网络的算法在公开数据集CASIA测试得到的和IoU值达到了62.3%和61.2%, 相比其他现有算法有明显提升.Abstract: Recently, Image splicing forgery detection methods based on Convolutional Neural Networks (CNNs) have been widely studied with continuous advancements. However, the performance of most existing models may not be satisfied caused by objects with various types and sizes. In this paper, we propose a new integrated multi-scale attention network to accommodate these problems. Specifically, we append two types of self-attention modules, namely, position attention model and channel attention model, between two convolution layers in feature extraction procedure. For position attention model, we emphasize the semantic interdependencies in spatial dimension by capturing the relationships between any two feature positions so that each pixel can perceive the information of the rest of the pixels. For channel attention model, we applysimilar self-attention operations to capture the relationships between any two-channel maps in order that each pixel can perceive the information of other channel pixels. Meanwhile, by dividing the feature maps into multiple subregions, our attention modules can better preserve and highlight the details while capturing long-range semantic information dependencies, which not only concern the spliced forgeries of various sizes but also reduce the computational cost for feature maps with high resolutions. Experimental results show that the and IoU of the integrated multi-scale attention network algorithm on the CASIA test set are 0.623 and 0.612, respectively, which are significantly improved compared to other existing algorithms.