基于通道多尺度融合的场景深度图超分辨率网络

缪永伟; 张新杰; 任瀚实; 张佳婧; 孙树森

doi:10.3724/SP.J.1089.2023.19328

基于通道多尺度融合的场景深度图超分辨率网络

A Channel Multi-Scale Fusion Network for Scene Depth Map Super-Resolution

摘要

摘要: 针对传统消费级深度相机采集的场景深度图通常存在分辨率低、深度图模糊等缺陷,利用场景高分辨率彩色图引导,提出一种基于通道多尺度融合的场景深度图超分辨率网络——CMSFN.为了有效地利用场景深度图的多尺度信息,CMSFN采用金字塔多尺度结构,在金字塔各层级上,通过对低分辨率深度图进行通道多尺度上采样,并结合残差学习提升深度图分辨率.首先,在超分辨率网络金字塔结构每一层级上对深度特征图与同尺度彩色特征图通过密集连接进行融合,使场景彩色-深度图特征得到复用并能够充分融合场景结构信息;其次,对融合后的深度特征图进行通道多尺度划分,使网络能获得不同大小的感受野,并在不同尺度上有效捕捉特征信息;最后,在CMSFN中加入全局与局部残差结构,使网络在恢复场景深度图高频残差信息的同时缓解梯度消失.对于Middlebury数据集A组,CMSFN方法超分辨率均方根误差平均为1.33,与MFR和PMBANet方法相比,分别降低了6.99%和26.92%;对于Middlebury数据集B组,CMSFN方法超分辨率均方根误差平均为1.41,与MFR和PMBANet方法相比,分别降低了9.03%和17.05%.实验结果表明,CMSFN能够有效地恢复场景深度图的结构信息.

Abstract: To overcome the limitations of low resolution or low quality for scene depth maps captured by consumer depth cameras,a scene depth map super-resolution network CMSFN is proposed based on channel multi-scale fusion which is guided by the input high-resolution color image.CMSFN adopts the multi-scale pyramid structure for effectively using its multi-scale information of scene depth map. For each level of pyramid, the resolution of depth map can be improved by channel multi-scale up-sampling operation and residual learning. Firstly, the depth feature map and corresponding color feature map are fused through densely connected blocks at each level of super-resolution network, so that the color-depth features can be reused and fused the structure information of underlying scenes. Secondly, the fused depth feature map can be divided into multi-scale channels, which can obtain different sizes of network receptive fields and also capture different scales of scene feature information. Finally, the global and local network residual structures are added to our CMSFN, which can alleviate the disappearance of network gradient while recovering igh-frequency residual information of scene depth map. For group A of Middlebury dataset, the average root mean square error of our CMSFN is 1.33, which is reduced by 6.99% and 26.92% respectively if comparing with MFR and PMBANet networks. For group B of Middlebury dataset, the average root mean square error of CMSFN is 1.41, which is reduced by 9.03% and 17.05% respectively if comparing with MFR and PMBANet networks. Experimental results illustrate that CMSFN method can recover the structural information of scene depth map effectively.

HTML全文

参考文献(31)

施引文献

资源附件(0)