高级检索

基于特征结合与权重对抗训练的鲁棒模型水印方案

Robust Model Watermarking Scheme Based on Feature Combination and Weight Adversarial Training

  • 摘要: 模型水印方案是用于保护深度神经网络模型所有者对其所有权的信息隐藏技术. 针对现有模型水印方案无法有效抵抗水印去除攻击的问题, 提出一种基于特征结合和权重对抗训练的模型水印方案. 该方案从特征结合角度融合原始训练数据集图像以构造触发集; 在水印嵌入阶段, 对模型进行权重对抗训练, 将扰动注入模型权重以模拟水印攻击环境, 从而有效地提高水印鲁棒性. 在 CIFAR-10 数据集上的实验结果表明, 与传统黑盒后门水印方案相比,所提方案在面对微调、覆写攻击时水印提取率可以达到 100%; 同时针对各类水印去除攻击, 该方案水印提取率都可以保持在 80%以上.

     

    Abstract: Model watermarking scheme is the information hiding technique for the owners of deep neural network model to protect model ownership. To address the issue of existing model watermarking schemes being ineffective against watermark removal attacks, a model watermarking scheme based on feature combination and weight adversarial training is proposed. This scheme fuses the original training dataset images from a feature combination perspective to construct the trigger set. During watermark embedding, weight adversarial training is applied to the model, injecting perturbations into the model weights to simulate the watermark attack environment, thereby effectively enhancing watermark robustness. Experimental results on CIFAR-10 dataset demonstrate that compared to traditional black-box backdoor watermark schemes, the proposed scheme achieves a watermark extraction rate of 100% when facing fine-tuning and overwriting attacks. Meanwhile, for various watermark removal attacks, the extraction rate of this scheme can be maintained at above 80%.

     

/

返回文章
返回