高级检索
刘强, 何自芬, 张印辉. 分支空洞卷积神经网络的机加工车间场景语义分割[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 126-141. DOI: 10.3724/SP.J.1089.2021.18383
引用本文: 刘强, 何自芬, 张印辉. 分支空洞卷积神经网络的机加工车间场景语义分割[J]. 计算机辅助设计与图形学学报, 2021, 33(1): 126-141. DOI: 10.3724/SP.J.1089.2021.18383
Liu Qiang, He Zifen, Zhang Yinhui. Semantic Segmentation of Mechanical Workshop Scenes with Branch-Atrous Convolutional Neural Networks[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 126-141. DOI: 10.3724/SP.J.1089.2021.18383
Citation: Liu Qiang, He Zifen, Zhang Yinhui. Semantic Segmentation of Mechanical Workshop Scenes with Branch-Atrous Convolutional Neural Networks[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 126-141. DOI: 10.3724/SP.J.1089.2021.18383

分支空洞卷积神经网络的机加工车间场景语义分割

Semantic Segmentation of Mechanical Workshop Scenes with Branch-Atrous Convolutional Neural Networks

  • 摘要: 机加工车间场景的语义分割方法是开发工业场景中自主导航小车AGV所需要的一项关键技术.针对AGV需要精确识别可通行区域和不可通行区域,以及机加工车间场景中目标类别较多且密集导致难以准确分割的问题,提出一种基于DeepLabv3深度学习模型架构的分支空洞卷积神经网络模型.在预训练残差网络ResNet-50的基础上,首先扩展分支结构,通过分支结构设置不同比例的空洞卷积扩张率实现调节特征图感受野,获取不同感受野的上下文信息;然后通过相同扩张率的叠加状态改善空洞卷积的棋盘效应,减少上下文信息的缺失;最后添加多尺度特征融合的解码器单元,利用目标定位准确的浅层特征和目标分类准确的深层特征进行特征融合,弥补由于棋盘效应导致的上下文信息缺失和像素信息不相关性的问题.在自制小样本机加工车间场景数据集上的实验结果表明,与DeepLabv3模型相比,该模型的验证精度提高5.14%,且对于可通行区域、道路线和不可通行区域的语义分割结果更加准确.

     

    Abstract: The semantic segmentation method of mechanical workshop scene is a key technology required for the development of autonomous guided vehicles(AGV)in industrial scenes.Aiming at the problem that AGV needs to accurately identify the passable and impassable area,and the mechanical workshop scene has many and dense target categories,leading to the problem of difficult segmentation,a branch-atrous convolutional neural network model based on the DeepLabv3 deep learning network model architecture is proposed.On the basis of the pre-trained residual networks ResNet-50,the branch structure is first expanded,the atrous convolution with different expansion rates is set through the branch structure to achieve the adjustment of the feature map receptive field and obtain the context information of different receptive fields;then the gridding problem of the atrous convolution is improved through the superposition state of the same expansion rate,the lack of context information is reduced;finally,the model adds a decoder unit of multi-scale feature fusion,the decoder unit uses the shallow features with accurate positioning and the deep features with accurate classification to perform feature fusion,to compensate for the lack of context information and the irrelevance of pixel information due to the gridding problem.The experimental results on the self-made small sample mechanical workshop scene dataset show that compared with the DeepLabv3 model,the verification accuracy of the model is improved by 5.14%,and the results of semantic segmentation for passable areas,road lines and impassable areas are more accurate.

     

/

返回文章
返回