基于生成式人工智能的图像视频生成方法综述及展望

张璐瑶; 杨帅; 汪文靖; 高翔; 刘家瑛

doi:10.3724/SP.J.1089.2024-00281

基于生成式人工智能的图像视频生成方法综述及展望

AIGC-Based Image and Video Generation Method: A Review

摘要

摘要: 视觉生成在艺术、娱乐等创意领域, 以及医学影像和数字出版等关键领域起到越来越重要的作用, 生成式人工智能在视觉生成方面的发展有望彻底改变人们与视觉数据的交互方式. 文中首先介绍深度学习时代下经典的生成模型框架, 根据视觉生成输入条件的不同, 重点梳理了近年来几类重要的图像生成模型和方法, 包括无条件图像生成、类别生成图像、文字生成图像和图像转换方法, 并介绍了它们在图像编辑方面的应用;然后根据训练数据要求的不同, 详细总结近年来以扩散模型为代表的视频生成和编辑模型及相应的优缺点;再介绍目前经典的图像生成和视频生成数据集和常用的评价标准;最后总结现阶段视觉生成面临的数据获取、推理效率、长视频生成、视频可控生成、安全等方面的挑战, 展望未来的研究方向.

Abstract: Visual generation plays an increasingly important role across diverse fields, from creative domains such as art and entertainment to critical areas such as medical imaging and digital publishing. The development of AIGC in visual generation will potentially revolutionize our interactions with visual data. First, this paper introduces the classical generative models in the deep learning era. Then, based on different input conditions, several important image generation models developed in recent years including unconditional image generation, class-to-image generation, text-to-image generation and image-to-image translation are highlighted, along with their applications in image editing. Next, a detailed summary of video generation and editing models, especially video diffusion models, is provided. And their advantages and disadvantages based on the requirements of training data are outlined. Additionally, this paper reviews the classic datasets for image and video generation and the commonly used evaluation metrics. Finally, the paper summarizes the challenges faced in visual generation in terms of data collection, inference efficiency, long video generation, controllable video generation and security, and discusses potential future research directions.

HTML全文

参考文献(169)

施引文献

资源附件(0)