多域结构信息引导的室内自监督深度估计
Multi-domain Structural Information Guided Indoor Self-supervised Depth Esti-mation
-
摘要: 无纹理区域因为缺少明显的视觉特征, 一直是自监督室内深度估计的研究难点. 针对目前主流的方法大部分利用空间域中的几何先验, 忽略了深度结构, 只能捕捉有限信息的问题, 提出多域结构信息引导的室内自监督深度估计方法. 网络(SIG-Depth), 利用梯度域和频域捕捉像素空间中的边界和细节, 从而增强网络对于高频结构的感知能力, 对重叠无纹理和细小物体区域的深度进行准确估计. 具体而言, 首先设计梯度增强模块, 在潜在空间通过精确的梯度先验锐化输入图像的深度结构; 随后, 在编解码器部分构建频率对齐模块, 用于显式捕获解码器缺失的频率信息; 最后, 提出梯度域和频域损失指导学习, 并设计低通滤波器以更好均衡高低频谱. 在NYUv2, ScanNet, InteriorNet和KITTI基准数据集上的实验结果表明, 与最先进的方法相比, 所提方法可以更精确地估计深度细节, 最关键指标δ<0.125分别达到86.6%, 81.2%, 75.0%和87.3%.Abstract: Textureless regions, lacking distinct visual features, have consistently posed a research challenge for self-supervised indoor depth estimation. To address the problem that current mainstream methods primarily rely on geometric priors in the spatial domain while neglecting depth structure, thereby capturing only lim-ited information, this paper proposes a multi-domain structure information guided indoor self-supervised depth estimation method (SIG-Depth). SIG-Depth leverages the gradient domain and frequency domain to capture boundaries and details in pixel space, thereby enhancing the network’s perception of high-frequency structures and enabling accurate depth estimation in challenging areas like overlapping textureless regions and small objects. Specifically, we first design a gradient augment module to sharpen the depth structure of the input image in the latent space using precise gradient priors; subsequently, we con-struct a frequency alignment module within the encoder-decoder framework to explicitly capture the fre-quency information missing in the decoder; finally, we introduce gradient domain and frequency domain losses to guide the learning process and design a low-pass filter to better balance the high and low fre-quency spectra. Experimental results on the NYUv2, ScanNet, InteriorNet, and KITTI benchmark datasets demonstrate that the proposed method estimates depth details more accurately compared to state-of-the-art methods, achieving the critical metric δ<0.125 of 86.6%, 81.2%, 75.0%, and 87.3% respectively.