Abstract:
To address the challenges of insufficient detail recovery and spatiotemporal inconsistency in low-light video enhancement (LLVE), this thesis proposes DL-Diff, a novel dark-to-bright diffusion model. The framework first establishes a conditional video-to-video generation task based on a pre-trained latent diffusion model. It then incorporates two synergistic components: a restoration component that learns the mapping from low-light to normal-light domains, and a temporal component that maintains inter-frame consistency. Furthermore, this thesis develops a multi-stage training strategy to optimize network parameters for enhanced video quality progressively. Extensive experiments on paired DID and SDSD datasets demonstrate DL-Diff’s superior performance. On the DID benchmark, this method achieves state-of-the-art results with spatial metrics of 41.29 (FID) and 0.17 (LPIPS), along with temporal metrics of 25.40 (AB(Var)) and 0.08 (MABD), outperforming existing LLVE approaches. Qualitative evaluations confirm that DL-Diff generates visually pleasing videos with improved spatial alignment and temporal coherence compared to competing methods.