Abstract:
In order to make deep self-attentive transform networks reach their full potential in the task of camouflaged object segmentation, a camouflaged object segmentation method called dense multi-scale Transformer is proposed. It consists of two main modules: two-branch separable dense multi-scale feature extraction module and fast attention-induced cross-level interaction fusion module. First, the Transformer is used as the backbone feature extractor to acquire features at each level; Second, these features are fed to a two-branch separable dense multi-scale feature extraction module, and rich multi-scale contextual features are extracted using dense recursively connected deep separable convolutional blocks in local and global branches; Finally, the fast attention-inducing cross-level interaction fusion module is used to fuse features at all levels. Each level of fused features is used to predict the camouflaged maps, and the features at each level are made highly spatially consistent by deep supervision, focusing attention on the camouflaged features as much as possible while avoiding the interference of background noise. Qualitative visualization and quantitative comparison experiments (in terms of five evaluation metrics: PR curve, S-measure, F-measure, E-measure and MAE) with 28 existing mainstream methods on four benchmark datasets, namely CHAMELEON, CAMO, COD10K and NC4K, demonstrate that the proposed Dense Multi-Scale Transformer is an effective model for camouflaged object segmentation.