高级检索

融合双阶段特征与Transformer编码的交互式图像分割

Interactive Image Segmentation Based on Fusion of Two-stage Feature and Transformer Encoder

  • 摘要: 为了快速、精确地分割用户感兴趣的前景目标, 获得高质量且低成本的标注分割数据, 提出一种基于双阶段特征融合与Transformer编码的交互式图像分割算法. 首先采用轻量化Transformer骨干网络对输入图像提取多尺度特征编码, 更好地利用上下文信息; 然后使用点击交互的方式引入主观先验知识, 依次通过初级与加强阶段将交互特征融入Transformer网络; 最后结合空洞卷积、注意力机制和多层感知机对骨干网络获取的特征图解码. 实验结果表明, 所提算法在GrabCut, Berkeley与DAVIS数据集上的mNoC@90%值分别达到2.18, 4.04和7.39, 优于其他对比算法, 且算法的时间与空间复杂度低于f-BRS-B, 对交互点击位置及点击类型的扰动变化具有较好的稳定性, 说明所提算法能够快速、精确与稳定地分割用户感兴趣目标, 可提升用户交互的使用体验感.

     

    Abstract: In order to segment the foreground objects that users are interested in quickly and accurately, and obtain high-quality and low-cost annotation segmentation data, an interactive image segmentation  algorithm based on two-stage feature fusion and Transformer encoder is proposed. Firstly, lightweight Transformer backbone network is adopted to extract multi-scale feature coding for input image, which can make better use of context information. Then, the subjective prior knowledge is introduced by means of click interaction, and the interactive features are integrated into Transformer network through the primary and enhanced stages in turn. Finally, the atrous convolution, attention mechanism and multi-layer perceptron are combined to decode the feature map obtained by the backbone network. Experimental results show that mNoC@90% values of the proposed algorithm on the GrabCut, Berkeley and DAVIS datasets reach 2.18, 4.04 and 7.39 respectively, which is better than other comparison algorithms. And the time and space  complexity is lower than that of f-BRS-B. The proposed algorithm has good stability to the disturbance change of interactive click position and click type. It shows that the proposed algorithm can quickly, accurately and stably segment users' interested objects, and improve user interaction experience.

     

/

返回文章
返回