Abstract:
Iris images captured in less-constrained environments are susceptible to the interference of noise such as specular reflections, eyelash and hair occlusions, motion and defocus blur, which makes it hard to accurately segment the valid iris region. To solve the problem, a symmetrical encoder-decoder network with Transformer is proposed for noise-robust iris segmentation. First, Swin Transformer is employed as the encoder to feed a sequence of the input image patches into hierarchical Transformer modules, so as to model the long-range dependencies of image pixels through self-attention mechanism, and enhance the interaction of contextual information.Secondly, a Transformer decoder which is symmetrical with the encoder is constructed, where the high-order context features extracted earlier are decoded in multiple layers. Besides, the skip connections are introduced to fuse the multi-scale features from the encoder with the up-sampled decoded features, which reduces the loss of spatial position information caused by down-sampling. Finally, supervised learning is carried out on each stage of the decoder, which improves the quality of extracted different-scaled features. A comparative experiment is carried out on three challenging noisy near-infrared(NIR) and visible(VIS) iris datasets, i.e., NICE.I,CASIA.v4-distance, and MICHE-I.Resultsshow that the proposed method achieves better segmentation performance than several benchmark methods including traditional methods, convolutional neural network-based methods and existing Transformer-based methods on multiple evaluation metrics like
E1,
E2,
F1, and MIOU, and particularly demonstrates significant advantages in reducing the interference of adverse noise. The iris recognition experiment on the CASIA.v4-distance dataset also shows that the proposed method can effectively improve the performance of iris recognition, suggesting a good application potential.