高级检索

多条件引导预训练扩散模型的可控舞蹈视频生成

Multiple Conditions-guided Pre-trained Diffusion Model for Controllable Dance Video Generation

  • 摘要: 为了生成符合用户个性化的舞蹈视频图像, 根据用户给定的音乐、表演者图像、舞蹈风格等直接生成舞蹈视频, 提出一种多条件引导预训练扩散模型的舞蹈视频生成模型. 首先设计一种音乐条件编码与控制模块, 在预训练扩散模型中能够引入音乐条件, 通过音乐特征引导扩散模型的去噪过程, 生成符合音乐条件的舞蹈视频图像; 然后结合表演者图像条件设计视觉上下文注意力模块, 基于ControlNet的表演者图像控制模块捕获表演者图像的局部特征, 通过交叉注意力机制将表演者图像特征传递到扩散过程, 使生成的舞蹈图像保持表演者外观一致性; 最后提出一种文本提示设计策略, 引导预训练扩散模型生成更高质量的舞蹈图像. 在音乐-舞蹈视频数据集AIST上验证了所提模型的有效性; 与基线模型相比, 在图像质量指标SSIM和PSNR上分别提升了10.24%和7.04%, 在视频质量指标IS和音视频对齐指标AV-Align上分别取得了6.18%和17.16%的改进.

     

    Abstract: In order to generate personalized dance video images for users, a multi-conditions-guided pre-trained diffusion model for dance video generation is proposed, which directly generates dance videos based on user provided music, performer image, dance style, etc. Firstly, a music encoding and control module is designed to introduce music as condition input into the pre-trained diffusion model that guide the denoising process of the diffusion model through music features to generate dance video images that meet the music conditions; Secondly, with the condition of the performer’s image, a visual contextual attention module is proposed. The performer’s image control module based on ControlNet captures the local features of the performer’s image, and transmits the performer’s image features to the diffusion process through a cross-attention mechanism, ensuring that the generated dance image maintains the consistency of the performer’s appearance; Finally, a text prompt strategy is designed to guide the pre-trained diffusion model to generate higher-quality dance images. The experimental results on the music-dance video dataset AIST have verified the effectiveness of the proposed model. Compared with the baseline model, the proposed model improved the image quality metrics SSIM and PSNR by 10.24% and 7.04%, respectively, and achieved improvements of 6.18% and 17.16% in the video quality metric IS and the audio-video alignment metric AV-Align, respectively.

     

/

返回文章
返回