A Review of Semantic Segmentation Methods Based on RGB-D Images
-
Graphical Abstract
-
Abstract
Semantic segmentation technology is dedicated to accurately identify and segment each object or scene in an image. With the popularity of depth sensor technology, the introduction of depth map has injected rich geometric information into the semantic segmentation network and substantially enhanced the precision of segmentation processes. The paper introduces the significant progress and related methods in the field of semantic segmentation based on RGB-D images in recent years. According to the differences in the processing of multimodal fusion features, the semantic segmentation methods based on RGB-D images are categorized into three categories, namely, single-branch, dual-branch, and three-branch network architectures. Among them, the single-branch network processes both RGB and depth features in the same branch to achieve the organic combination of features; the dual-branch network optimises the correction and fusion of multimodal features by taking advantage of the complementarity between RGB and depth features; and the three-branch network explores the fusion features in depth while retaining the original RGB and depth features to ensure the comprehensiveness of information. Additionally, the paper summarizes the key techniques such as attention mechanisms and model optimization, and reviews commonly used datasets and evaluation metrics. It compares and analyzes the performance of various methods on different datasets. Finally, the paper summarizes the current challenges of RGB-D image semantic segmentation in the interaction and processing of multimodal data. It also looks forward to the development prospects of semantic segmentation technology in the direction of cross-domain data fusion.
-
-