Multi-domain Structural Information Guided Indoor Self-supervised Depth Esti-mation
-
Graphical Abstract
-
Abstract
Textureless regions, lacking distinct visual features, have consistently posed a research challenge for self-supervised indoor depth estimation. To address the problem that current mainstream methods primarily rely on geometric priors in the spatial domain while neglecting depth structure, thereby capturing only lim-ited information, this paper proposes a multi-domain structure information guided indoor self-supervised depth estimation method (SIG-Depth). SIG-Depth leverages the gradient domain and frequency domain to capture boundaries and details in pixel space, thereby enhancing the network’s perception of high-frequency structures and enabling accurate depth estimation in challenging areas like overlapping textureless regions and small objects. Specifically, we first design a gradient augment module to sharpen the depth structure of the input image in the latent space using precise gradient priors; subsequently, we con-struct a frequency alignment module within the encoder-decoder framework to explicitly capture the fre-quency information missing in the decoder; finally, we introduce gradient domain and frequency domain losses to guide the learning process and design a low-pass filter to better balance the high and low fre-quency spectra. Experimental results on the NYUv2, ScanNet, InteriorNet, and KITTI benchmark datasets demonstrate that the proposed method estimates depth details more accurately compared to state-of-the-art methods, achieving the critical metric δ<0.125 of 86.6%, 81.2%, 75.0%, and 87.3% respectively.
-
-