基于RGB-D图像的语义分割方法综述
A Review of Semantic Segmentation Methods Based on RGB-D Images
-
摘要: 语义分割技术致力于精确识别并分割图像中的各个物体或场景. 基于RGB图像的方法在信息利用上存在局限, 导致性能受限, 随着深度传感器技术的普及, 深度图的引入为语义分割网络注入了丰富的几何信息, 显著提升了分割精度. 文中介绍了近几年基于RGB-D图像的语义分割领域的显著进展和相关方法, 根据对多模态融合特征处理方式的差异, 将基于RGB-D图像的语义分割方法归纳为单分支、双分支、三分支网络架构3大类. 其中, 单分支网络在同一分支同时处理RGB和深度特征, 实现特征的有机结合; 双分支网络利用RGB和深度特征之间的互补性, 优化多模态特征的校正与融合; 三分支网络在保留原始的RGB和深度特征的同时, 深入挖掘融合特征, 确保信息的全面性. 同时, 总结注意力、模型优化等关键技术, 并归纳常用的数据集和评价指标, 对比分析各种方法在不同数据集上的性能, 最后总结当前RGB-D图像语义分割领域在多模态数据交互与处理方面所面临的挑战, 展望了语义分割技术在跨领域数据融合方向的发展前景.Abstract: Semantic segmentation technology is dedicated to accurately identify and segment each object or scene in an image. With the popularity of depth sensor technology, the introduction of depth map has injected rich geometric information into the semantic segmentation network and substantially enhanced the precision of segmentation processes. The paper introduces the significant progress and related methods in the field of semantic segmentation based on RGB-D images in recent years. According to the differences in the processing of multimodal fusion features, the semantic segmentation methods based on RGB-D images are categorized into three categories, namely, single-branch, dual-branch, and three-branch network architectures. Among them, the single-branch network processes both RGB and depth features in the same branch to achieve the organic combination of features; the dual-branch network optimises the correction and fusion of multimodal features by taking advantage of the complementarity between RGB and depth features; and the three-branch network explores the fusion features in depth while retaining the original RGB and depth features to ensure the comprehensiveness of information. Additionally, the paper summarizes the key techniques such as attention mechanisms and model optimization, and reviews commonly used datasets and evaluation metrics. It compares and analyzes the performance of various methods on different datasets. Finally, the paper summarizes the current challenges of RGB-D image semantic segmentation in the interaction and processing of multimodal data. It also looks forward to the development prospects of semantic segmentation technology in the direction of cross-domain data fusion.