高级检索
潘祖望, 桂彦, 易宇航, 张建明. 全局特征增强及掩模矫正的半监督视频目标分割方法[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2023-00738
引用本文: 潘祖望, 桂彦, 易宇航, 张建明. 全局特征增强及掩模矫正的半监督视频目标分割方法[J]. 计算机辅助设计与图形学学报. DOI: 10.3724/SP.J.1089.2023-00738
Zuwang Pan, Yan Gui, Yuhang Yi, Jianming Zhang. Semi-Supervised Video Object Segmentation with Global Feature Enhancement and Mask Correction[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2023-00738
Citation: Zuwang Pan, Yan Gui, Yuhang Yi, Jianming Zhang. Semi-Supervised Video Object Segmentation with Global Feature Enhancement and Mask Correction[J]. Journal of Computer-Aided Design & Computer Graphics. DOI: 10.3724/SP.J.1089.2023-00738

全局特征增强及掩模矫正的半监督视频目标分割方法

Semi-Supervised Video Object Segmentation with Global Feature Enhancement and Mask Correction

  • 摘要: 针对视频目标分割中存在的相似目标辨别精确度低、分割误差累积等问题, 提出一种全局特征增强及掩模矫正的半监督视频目标分割方法. 首先采用全局上下文感知模块增强特征, 利用2个全局存储单元建模特征的全局依赖关系, 捕获视频帧内和帧间的全局上下文信息, 提高分割模型对相似干扰物的辨别能力; 然后提出细节感知解码器, 在解码初期阶段通过跳跃连接融合编码器特征, 学习局部细节增强的解码特征; 最后在解码后期阶段部署掩模矫正模块, 估计粗糙分割掩模中不确定区域, 并对模糊目标边界及分割错误区域进行矫正, 获得精确的视频目标分割结果. 在具有挑战的DAVIS和YouTube-VOS基准数据集上进行大量实验的结果表明, 所提方法明显优于文中对比方法, 在YouTube-VOS 2019验证集中的性能分数G相较于STCN和GSFM方法分别提高了1.6和0.3.

     

    Abstract: To address the issues of low discrimination accuracy of similar objects and error accumulation caused by segmentation errors in video object segmentation, we propose a novel semi-supervised video object segmentation method with global feature enhancement and mask correction. Firstly, we employ a global context-aware module to enhance features by modeling the global dependencies of features via two global memory units, capturing global contextual information within and between video frames and thus improving the models' ability to distinguish between similar distractors. Secondly, we propose a detail-aware decoding which fuses encoded features through skip connections in the early decoding stage to learn detail-enhanced decoding features. Finally, we design a mask correction module at the later decoding stage to estimate uncertain areas in coarse segmentation masks and correct blurred object boundaries and other segmentation errors, producing accurate video object segmentation results. Extensive validation experiments on challenging DAVIS and YouTube-VOS benchmarks demonstrate that our method is significantly better than the compared methods in the paper and outperforms baselines STCN and GSFM on the YouTube-VOS 2019 validation set by 1.6 and 0.3 G, respectively.

     

/

返回文章
返回