自注意力多尺度特征融合的遥感图像语义分割算法
Semantic Segmentation of Remote Sensing Image via Self-Attention-Based Multi-Scale Feature Fusion
-
摘要: 针对遥感图像其内容复杂、物体尺度差异较大和分布不均匀等因素导致遥感图像语义分割不完整、准确率低的问题, 提出一种利用自注意力进行多尺度特征融合的遥感图像语义分割算法. 该算法的主体基于编码-解码器结构, 编码器使用Swin-Transformer模型来提取复杂的多尺度特征, 解码器由自注意力多尺度特征融合模块和特征金字塔网络构成. 首先将提取的多个尺度的特征分别进行相应的调整, 转换成相同尺度; 然后输入到自注意力多尺度特征融合模块对图像的多尺度特征进行融合, 确保不同尺度的特征信息在语义分割中被充分利用; 接着使用特征金字塔从上往下对自注意力多尺度特征融合的结果进一步叠加融合; 最后预测得到分割结果. 在公开遥感图像语义分割数据集LoveDA上与主流算法进行比较, 实验结果表明该算法在单尺度策略下平均交并比达到52.77%, 对比次优结果提升了1.42%, 在多尺度策略下平均交并比为54.19%, 对比次优结果提升了1.47%; 能有效地融合多尺度特征提高分割精度.Abstract: In order to solve the problems of incomplete and low accuracy of semantic segmentation of remote sensing images due to complex contents, large differences in object scales, and uneven distribution of remote sensing images, we propose one semantic segmentation algorithm of remote sensing images with self-attention multi-scale feature fusion. The main body of the algorithm is based on the encoder-decoder structure, where the encoder uses the Swin-Transformer model to extract complex multi-scale features, and the decoder consists of a self-attention multi-scale feature fusion module and a feature pyramid network. The extracted multi-scale features are firstly adjusted to the same scale and then fed into the self-attention multi-scale feature fusion module to fuse the multi-scale features to ensure that the feature information at different scales which can be fully utilized in the semantic segmentation. Afterwards, the results of the self-attention multi-scale feature fusion are further superimposed and fused from top to down using the feature pyramid. Finally, the results are predicted. The experimental results show that the proposed algorithm achieves the mean intersection over union is 52.77% under the single-scale strategy, which is 1.42% better than the suboptimal result and the mean intersection over union is 54.19% under the multi-scale strategy, which is 1.47% better than the suboptimal result. The experiment demonstrates that the proposed algorithm can effectively fuse the multi-scale features to improve the segmentation accuracy.