高级检索

基于生成式人工智能的图像视频生成方法综述及展望

AIGC-based Image and Video Generation Method: A Review

  • 摘要: 当今人工智能技术快速发展, 生成式人工智能在视觉生成方面取得显著进步. 视觉生成旨在根据随机噪声或不同的输入条件生成相应的图像和视频等视觉内容, 在艺术、娱乐等创意领域到医学影像和数字出版等关键领域启到越来越重要的作用, 生成式人工智能在视觉生成方面的发展有望彻底改变我们与视觉数据的交互方式. 首先, 介绍深度学习时代下经典的生成模型框架; 然后, 根据输入条件的不同, 重点梳理了近年来几类重要的图像生成模型和方法, 并介绍了其在图像编辑方面的应用; 接着, 根据训练数据要求的不同, 详细总结了近年来以扩散模型为代表的视频生成和编辑模型及相应的优缺点; 再次, 介绍了目前经典的图像生成和视频生成数据集和常用的评价标准; 最后, 总结现阶段视觉生成面临的挑战, 并对未来的研究方向进行展望.

     

    Abstract: With the rapid advancements in artificial intelligence, AIGC has achieved significant progress in visual generation. Visual generation aims to create images and videos based on random noise or various input conditions. It is playing an increasingly important role across diverse fields, from creative domains such as art and entertainment to critical areas such as medical imaging and digital publishing. The development of AIGC in visual generation will potentially revolutionize our interactions with visual data. First, this paper introduces the classical generative models in the deep learning era. Then, based on different input conditions, several important image generation models developed in recent years are highlighted, along with their applications in image editing. Next, a detailed summary of video generation and editing models, especially video diffusion models, is provided. And their advantages and disadvantages based on the requirements of training data are outlined. Additionally, this paper reviews the classic datasets for image and video generation and the commonly used evaluation metrics. Finally, the paper summarizes the challenges faced in visual generation and discusses potential future research directions.

     

/

返回文章
返回