基于密集多尺度自注意力变换网络的伪装对象分割方法

闫馨宇; 孙美君; 韩亚洪; 王征

基于密集多尺度自注意力变换网络的伪装对象分割方法

Camouflaged Object Segmentation Based on Dense Multi-Scale Transformer

摘要

摘要: 为了充分发挥深度自注意力变换网络在伪装对象分割任务中的潜力, 提出一种基于密集多尺度自注意力变换网络的伪装对象分割方法. 该方法包含双分支可分离密集多尺度特征提取和快速注意力诱导的跨级交互融合2个主要模块. 首先以自注意力变换网络作为骨干特征提取器获取各级特征; 然后将提取的特征馈送到双分支可分离密集多尺度特征提取模块, 在局部分支和全局分支中利用密集递进相连的深度可分离卷积块提取丰富的多尺度上下文特征; 最后使用快速注意力诱导的跨级交互融合模块来融合各级特征, 并利用每一级融合特征预测伪装映射, 通过深度监督让各级特征在空间上保持高度一致性, 尽可能集中注意力于伪装特征而避免背景噪声的干扰. 在CHAMELEON, CAMO, COD10K以及NC4K等4个基准数据集上, 与其他28种主流模型进行的定性可视化对比以及针对PR曲线、S值、F值、E值及MAE等5种评价指标的定量对比实验表明, 所提出的基于密集多尺度自注意力变换网络是一种有效的伪装对象分割模型.

Abstract: In order to make deep self-attentive transform networks reach their full potential in the task of camouflaged object segmentation, a camouflaged object segmentation method called dense multi-scale Transformer is proposed. It consists of two main modules: two-branch separable dense multi-scale feature extraction module and fast attention-induced cross-level interaction fusion module. First, the Transformer is used as the backbone feature extractor to acquire features at each level; Second, these features are fed to a two-branch separable dense multi-scale feature extraction module, and rich multi-scale contextual features are extracted using dense recursively connected deep separable convolutional blocks in local and global branches; Finally, the fast attention-inducing cross-level interaction fusion module is used to fuse features at all levels. Each level of fused features is used to predict the camouflaged maps, and the features at each level are made highly spatially consistent by deep supervision, focusing attention on the camouflaged features as much as possible while avoiding the interference of background noise. Qualitative visualization and quantitative comparison experiments (in terms of five evaluation metrics: PR curve, S-measure, F-measure, E-measure and MAE) with 28 existing mainstream methods on four benchmark datasets, namely CHAMELEON, CAMO, COD10K and NC4K, demonstrate that the proposed Dense Multi-Scale Transformer is an effective model for camouflaged object segmentation.

HTML全文

参考文献(0)

施引文献

资源附件(0)