高级检索
李树原, 严彩萍, 李红. 一种用于图像篡改检测的混合Transformer网络[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2024.20099
引用本文: 李树原, 严彩萍, 李红. 一种用于图像篡改检测的混合Transformer网络[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2024.20099
Shuyuan Li, Caiping Yan, Hong Li. A hybrid Transformer Network for Image splicing Forgery Detection[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2024.20099
Citation: Shuyuan Li, Caiping Yan, Hong Li. A hybrid Transformer Network for Image splicing Forgery Detection[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2024.20099

一种用于图像篡改检测的混合Transformer网络

A hybrid Transformer Network for Image splicing Forgery Detection

  • 摘要: 在过去几年中, 人们提出了许多基于卷积神经网络的框架用于图像拼接伪造检测. 然而, 由于篡改区域的尺度是变化的, 现有的大多数方法都不能获得令人满意的性能, 尤其是对于大尺度的对象. 为了获得准确的篡改定位结果, 提出了一种混合Transformer网络,该网络将自注意力和交叉注意力结合到U2-Net中,用于图像拼接伪造检测. 具体而言, 在编码器的最后一个模块应用自注意力来捕获长距离语义信息依赖关系, 从而使网络能够更完整地定位大尺度篡改区域. 同时, 在跳跃连接中, 设计了一个交叉注意力模块, 在高层语义信息的指导下增强了低层特征图, 并过滤非语义特征, 实现了更精细的空间恢复. 因此, 混合网络结合了Transformer的自注意力和交叉注意力的优点, 能够从不同的尺度捕获更多的语义信息和空间依赖性. 也就是说, 所提网络将卷积和Transformer融合在一起, 可以定位不同大小的拼接篡改区域, 而无需对大量图像进行预训练. 基于Casia2.0、Columbia两个公共数据集上与四种传统方法以及六种最新的深度学习方法进行对比, 本文方法取得了更优性能.

     

    Abstract: In the past few years, many frameworks based on convolutional neural networks have been proposed for image splicing forgery detection. However, most of the existing algorithms can not obtain satisfactory performance due to tampered areas with various sizes, especially for objects with large-scale. In order to obtain an accurate forgery localization result, a hybrid Transformer architecture, which integrates both self-attention and cross-attention into U2-Net, is proposed for image splicing forgery detection. Specifically, self-attention is applied at the last block of encoder to capture long-range semantic information dependencies, so that the network can more completely locate large-scale tampered areas. Meanwhile, in the skip connections, a cross-attention module is designed to enhance the low-level feature maps with the guidance of high-level semantic information, filter out non-semantic features, and achieve more refined spatial recovery.Therefore, the hybrid model, which combines both advantages of self- and cross-attention from Transformer, has the ability to capture more context information and spatial dependencies from different scales. That is to say, the proposed method, fusing the convolution and Transformer together, can locate spliced forgeries with various sizes without requiring pre-training on a large number of images. Compared with four traditional methods and six new deep learning methods based on Casia2.0 and Columbia, the method in this paper achieves the better performance.

     

/

返回文章
返回