基于特征变换的自监督单目热图像深度估计
Self-Supervised Depth Estimation of Monocular Thermal Images based on Feature Transformation
-
摘要: 基于编码-解码结构的自监督热图像深度估计在弱光场景中展现出了较高的准确性和可靠性. 然而, 受到热图像自身的物理特性(如较低的信噪比和有限的对比度)的影响, 基于编码-解码结构的热图像深度估计方法在网络训练过程中既难以生成有效的监督信号, 又难以准确地识别远距离信息, 从而影响深度图的精度. 针对上述问题, 提出一种基于特征变换的自监督单目热图像深度估计方法. 首先对编码器提取的原始热图像特征进行变换, 并通过线性组合得到不确定特征图, 增强网络对特征的敏感度, 提高模型在弱光场景下的鲁棒性; 然后设计一个归一化模块用于优化解码器, 以便更好地利用特征信息生成高质量的深度图. 最后, 在ViViD++数据集上与现有经典方法进行对比, 实验结果表明所提方法的绝对误差降低了12%, 相对误差降低了10%, 均方根误差降低了11%; 因此, 所提方法具有较高的准确率, 不仅能够提高深度图的质量, 还表现出较好的鲁棒性.Abstract: Self-supervised depth estimation of thermal images based on the encoding-decoding structure has demonstrated high accuracy and reliability in complex scenes. However, due to the physical limitations of thermal images, such as low signal-to-noise ratio and limited contrast, thermal image depth estimation methods based on the encoder-decoder structure face challenges in generating effective supervisory signals during network training and accurately identifying long-distance information, affecting the precision of the depth map. To address these issues, we proposed a novel self-supervised monocular depth estimation method based on feature transformation. The core idea is an encoding-decoding structure based on feature transformation. First, the raw thermal image features extracted by the encoder are transformed and linearly combined to produce an uncertain feature map, enhancing the network’s sensitivity to features and improving the model’s adaptability to complex scenes. Then, this study designed a new normalization module to optimize the decoder for effectively utilizing feature information and generating high-quality depth maps. we tested the proposed method on the ViViD++ dataset and compared it with state-of-the-art methods. Experimental results show that the proposed method reduces the absolute error by 12%, the relative error by 10%, and the root mean square error by 11%. As a result, we can conclude that the proposed method has high accuracy, not only improving the quality of depth maps but also exhibiting strong robustness.