基于混合多模注意力融合矫正的深度图超分辨率重建

侯兵; 董秀成; 杨陈成; 雍萧; 雎雅玲

doi:10.3724/SP.J.1089.2024-00105

基于混合多模注意力融合矫正的深度图超分辨率重建

Depth Map Super-Resolution Reconstruction Based on Hybrid Multimodal Attention Fusion Correction

摘要

摘要: 针对低成本深度相机捕获的深度图中通常存在模糊和分辨率低等多种退化问题,联合同场景高分辨率彩色图像,提出一种混合多模注意力融合矫正网络.为了有效地利用彩色图像的结构信息,使用注意力机制迭代对齐及融合深度特征和彩色结构特征,并通过彩色结构特征和感知边缘图对融合特征进行矫正.首先在深度边缘感知模块,联合低分辨率深度图和彩色边缘图提取高分辨率深度边缘图;然后使用侧窗边缘保留模块提取引导彩色图像中有用的结构信息,减少多余纹理信息的影响;最后在多模注意力融合矫正模块中,利用注意力机制提取深度特征和彩色结构特征的结构注意力分布,使2种模态特征实现对齐及融合,并利用彩色结构特征和高分辨率深度边缘图的空间注意力权重增强对深度边缘的监督,减少模糊和伪影问题.在NYU v2数据集和Middlebury数据集(2005)上的实验结果表明,所提方法均方根误差平均测试结果分别为2.96和1.45,相对于次优方法分别降低了0.10和0.08;该网络能重建出边缘更加清晰和伪影更少的高分辨率深度图.

Abstract: To solve the problems of blurring and low resolution in depth maps captured by low-cost depth cameras, a hybrid multimode attention fusion correction network is proposed by combining high-resolution color images in the same scene. To effectively leverage the structure information of color images, the attention mechanism is employed to iteratively align and integrate depth features with color structural features, and the fused features are corrected through the incorporation of color structure features and edge aware maps. Initially, the deep edge aware block combines low-resolution depth maps with color edge maps to extract high-resolution depth edge maps. Then, the side-window edge preserve block is utilized to extract useful structural information from the color image, reducing the impact of extraneous texture information. Finally, multimodal attention fusion correction block utilizes attention mechanisms to extract the structural attention distribution of both depth features and color structural features, aligning and fusing these modalities. The spatial attention weights of the color structural features and high-resolution depth edge maps are used to enhance the supervision of the depth edges, reducing blurring and artifacts. Experimental results on the NYU v2 dataset and Middlebury dataset (2005) show that the average root mean square error test results are 2.96 and 1.45, respectively, which are reduced by 0.10 and 0.08 compared to suboptimal methods. The network can reconstruct high-resolution depth maps with clearer edges and fewer artifacts.

HTML全文

参考文献(34)

施引文献

资源附件(0)