高级检索
司马海峰, 许毓霜, 王静, 徐明亮. 细化多尺度感知与优化轮廓的自适应道路场景语义分割网络[J]. 计算机辅助设计与图形学学报.
引用本文: 司马海峰, 许毓霜, 王静, 徐明亮. 细化多尺度感知与优化轮廓的自适应道路场景语义分割网络[J]. 计算机辅助设计与图形学学报.
An Adaptive Road Scene Semantic Segmentation Network with Refined Multi-Scale Perception and Optimized Contours[J]. Journal of Computer-Aided Design & Computer Graphics.
Citation: An Adaptive Road Scene Semantic Segmentation Network with Refined Multi-Scale Perception and Optimized Contours[J]. Journal of Computer-Aided Design & Computer Graphics.

细化多尺度感知与优化轮廓的自适应道路场景语义分割网络

An Adaptive Road Scene Semantic Segmentation Network with Refined Multi-Scale Perception and Optimized Contours

  • 摘要: 语义分割通常被描述为像素级的分类任务, 而集成卷积神经网络与Transformer的MaskFormer网络则将其描述为掩膜级的分类任务. 为了解决其形变建模能力差、物体轮廓分割模糊和收敛速度慢的问题, 提出一个细化多尺度感知与优化轮廓的自适应道路场景语义分割网络. 首先在编码器中采用标准卷积与可变形卷积堆叠形成的瓶颈结构来提高网络的形变建模能力; 然后在解码器中采用特征细化模块来过滤无关特征, 进一步提高特征金字塔网络的解码能力; 针对特征金字塔网络进行多层级特征融合时上采样特征会出现像素点错位的问题, 通过引入特征校准模块优化物体轮廓的分割效果; 最后在Transformer模块中采用Miti-DETR解码器加快网络的训练速度, 提升分割精度. 实验结果表明, 所提网络在Cityscapes和Mapillary Vistas数据集上以较大的优势超过了现有的语义分割网络.

     

    Abstract: Semantic segmentation is usually described as a pixel-level classification task, while the MaskFormer model integrating Convolutional Neural Network and Transformer describes it as a mask-level classification task. To solve the problems of poor deformation modeling ability, blurred object contour segmentation and slow convergence speed, an adaptive road scene semantic segmentation network with refined  multi-scale perception and optimized contours is proposed. Firstly, the bottleneck structure formed by standard convolution and deformable convolution stack is used in the encoder to improve the deformation modeling ability of the network. Then, the feature refinement module is adopted in the decoder to filter the irrelevant features, which further improves the decoding ability of the feature pyramid network. To address the problem of pixel misalignment in the up-sampled features when the feature pyramid network is used for multi-level feature fusion, a feature calibration module is introduced to optimize the segmentation effect of object contours; Finally, the Miti-DETR decoder is employed in the Transformer module to speed up the training speed of the network and improve the segmentation accuracy. Experimental results show that the proposed network surpasses the existing semantic segmentation model on the Cityscapes and Mapillary Vistas datasets.

     

/

返回文章
返回