Abstract:
Model watermarking scheme is the information hiding technique for the owners of deep neural network model to protect model ownership. To address the issue of existing model watermarking schemes being ineffective against watermark removal attacks, a model watermarking scheme based on feature combination and weight adversarial training is proposed. This scheme fuses the original training dataset images from a feature combination perspective to construct the trigger set. During watermark embedding, weight adversarial training is applied to the model, injecting perturbations into the model weights to simulate the watermark attack environment, thereby effectively enhancing watermark robustness. Experimental results on CIFAR-10 dataset demonstrate that compared to traditional black-box backdoor watermark schemes, the proposed scheme achieves a watermark extraction rate of 100% when facing fine-tuning and overwriting attacks. Meanwhile, for various watermark removal attacks, the extraction rate of this scheme can be maintained at above 80%.