细化多尺度感知与优化轮廓的自适应道路场景语义分割网络
An Adaptive Road Scene Semantic Segmentation Network with Refined MultiScale Perception and Optimized Contours
-
摘要: 语义分割通常被描述为像素级的分类任务,而集成卷积神经网络与Transformer的MaskFormer网络则将其描述为掩模级的分类任务.为了解决语义分割形变建模能力差、物体轮廓分割模糊和收敛速度慢的问题,提出一种细化多尺度感知与优化轮廓的自适应道路场景语义分割网络.在编码器中,采用标准卷积与可变形卷积堆叠形成的瓶颈结构提高网络的形变建模能力;在解码器中,采用特征细化模块过滤无关特征,进一步提高特征金字塔网络的解码能力;针对特征金字塔网络进行多层级特征融合时上采样特征出现像素点错位的问题,引入特征校准模块优化物体轮廓的分割效果;最后在Transformer模块中采用Miti-DETR解码器加快网络的训练速度,提升分割精度.实验结果表明,所提网络在Cityscapes和Mapillary Vistas数据集上以较大的优势超过了现有的语义分割网络.Abstract: Semantic segmentation is usually described as a pixel-level classification task, while the MaskFormer model integrating Convolutional Neural Network and Transformer describes it as a mask-level classification task. To solve the problems of poor deformation modeling ability, blurred object contour segmentation and slow convergence speed, an adaptive road scene semantic segmentation network with refined multi-scale perception and optimized contours is proposed. Firstly, the bottleneck structure formed by standard convolution and deformable convolution stack is used in the encoder to improve the deformation modeling ability of the network. Then, the feature refinement module is adopted in the decoder to filter the irrelevant features, which further improves the decoding ability of the feature pyramid network. To address the problem of pixel misalignment in the up-sampled features when the feature pyramid network is used for multi-level feature fusion, a feature calibration module is introduced to optimize the segmentation effect of object contours; Finally, the Miti-DETR decoder is employed in the Transformer module to speed up the training speed of the network and improve the segmentation accuracy. Experimental results show that the proposed network surpasses the existing semantic segmentation model on the Cityscapes and Mapillary Vistas datasets.