DL-Diff: 基于暗转亮扩散模型的黑暗视频增强方法
DL-Diff: Advancing Low-Light Video Enhancement through Dark-to-Light Diffusion Models
-
摘要: 针对低光照视频增强任务中细节恢复不足与时空一致性缺失的问题, 提出一种基于暗转亮扩散模型的低光照视频增强方法DL-Diff. 首先, 所提方法基于预训练的潜在扩散模型构建基础模型, 将低光照视频增强任务转换为有条件的视频到视频的生成任务; 其次, 设计协同工作的恢复组件与时序组件, 其中恢复组件实现黑暗视频到正常光照视频的映射, 时序组件则确保连续帧之间的时序一致性; 此外, 所提方法设计了一个多阶段训练流程, 通过分步优化网络参数从而逐步提高视频的恢复质量. 所述方法在成对的DID和SDSD数据集上进行了充分的实验. 定量实验表明, DL-Diff在各个指标上达到了最优和次优的效果. 尤其在DID数据集上, DL-Diff的空间指标FID达到41.29, LPIPS达到0.17, 时序指标AB(Var)达到25.40, MABD达到0.08, 均超越其他LLVE方法. 同时, 定性结果也表明, DL-Diff能够生成兼具空间对齐和时间连续性的光亮视频, 且视觉效果优于其他方法.Abstract: To address the challenges of insufficient detail recovery and spatiotemporal inconsistency in low-light video enhancement (LLVE), this thesis proposes DL-Diff, a novel dark-to-bright diffusion model. The framework first establishes a conditional video-to-video generation task based on a pre-trained latent diffusion model. It then incorporates two synergistic components: (1) a Restoration Component that learns the mapping from low-light to normal-light domains and (2) a Temporal Component that maintains inter-frame consistency. Furthermore, this thesis develops a multi-stage training strategy to optimize network parameters for enhanced video quality progressively. Extensive experiments on paired DID and SDSD datasets demonstrate DL-Diff's superior performance. On the DID benchmark, this method achieves state-of-the-art results with spatial metrics of 41.29 (FID) and 0.17 (LPIPS), along with temporal metrics of 25.40 (AB(Var)) and 0.08 (MABD), outperforming existing LLVE approaches. Qualitative evaluations confirm that DL-Diff generates visually pleasing videos with improved spatial alignment and temporal coherence compared to competing methods.