Abstract:
To address the problems of partial semantic information loss and low accuracy of boundary localization when convolutional neural networks are used for image semantic segmentation, this paper constructs a convolutional neural network by combining the attention mechanism and multi-scale features. The model firstly combines the multi-scale features extracted by the network based on the attention mechanism for weighting, then uses dilated convolution and global average pooling to aggregate the multi-scale target information, and finally uses the boundary fine-grained feature extraction module to optimize the segmentation boundary. Experimental results on the multi-scale PASCAL VOC2012 and high-resolution Cityscapes datasets show that the segmentation effect of the network in this paper is significantly better than that of the backbone ResNet-101, and the average cross-merge ratio is improved by 12.2% and 9.3%, respectively.