基于混合多模注意力融合矫正的深度图超分辨率重建

侯兵; 董秀成; 杨陈成; 雍萧; 雎雅玲

doi:10.3724/SP.J.1089.2024-00105

基于混合多模注意力融合矫正的深度图超分辨率重建

Depth Map Super-Resolution Reconstruction Based On Hybrid Multimodal Attention Fusion Correction

摘要

摘要: 针对低成本深度相机捕获的深度图中通常存在模糊和分辨率低等多种退化问题, 联合同场景高分辨率彩色图像, 提出一种混合多模注意力融合矫正网络. 为了有效地利用彩色图像的结构信息, 使用注意力机制迭代对齐及融合深度特征和彩色结构特征, 并通过彩色结构特征和感知边缘图对融合特征进行矫正. 首先在深度边缘感知模块, 联合低分辨率深度图和彩色边缘图提取高分辨率深度边缘图; 然后使用侧窗边缘保留模块提取引导彩色图像中有用的结构信息, 减少多余纹理信息的影响; 最后在多模注意力融合矫正模块中, 利用注意力机制提取深度特征和彩色结构特征的结构注意力分布, 使2种模态特征实现对齐及融合, 并利用彩色结构特征和高分辨率深度边缘图的空间注意力权重增强对深度边缘的监督, 减少模糊和伪影问题. 在NYU v2数据集和Middlebury数据集(2005)上的实验结果表明, 均方根误差平均测试结果分别为2.96和1.45, 相对于次优方法分别降低了0.10和0.08; 该网络能重建出边缘更加清晰和伪影更少的高分辨率深度图.

Abstract: To solve the problems of blurring and low resolution in depth maps captured by low-cost depth cameras, a hybrid multimode attention fusion correction network (HMAFCN) is proposed by combining high-resolution color images in the same scene. In order to effectively use the structure information of color image, HMAFCN uses the attention mechanism to iteratively integrate the depth features and color structure features, and corrects the fused features through the color structure features and perceptual edge map. Firstly, in the deep edge aware blocks, the high-resolution depth edge map is extracted by combining low-resolution depth map and color edge map; Secondly, the side-window edge preserving blocks is used to extract the useful structure information in the guide color image to reduce the influence of redundant texture information; Finally, the multimodal attention fusion correction blocks uses the attention mechanism to extract the structural feature distribution of the depth feature and the color structure feature, so that the two modal features can be integrated, and uses the spatial attention weight of the color structure feature and the high-resolution depth edge map to enhance the supervision of the depth edge and overcome the limitation of ambiguity and artifacts. The average test results of root mean square error in NYU v2 datasets and Middlebury datasets were 2.96 and 1.44, respectively, which were reduced by 0.1 and 0.09 compared with the suboptimal method. The experimental results show that the proposed method can reconstruct high-resolution depth maps with clearer edges and fewer artifacts.

HTML全文

参考文献(0)

施引文献

资源附件(0)