Abstract:
Visual generation plays an increasingly important role across diverse fields, from creative domains such as art and entertainment to critical areas such as medical imaging and digital publishing. The development of AIGC in visual generation will potentially revolutionize our interactions with visual data. First, this paper introduces the classical generative models in the deep learning era. Then, based on different input conditions, several important image generation models developed in recent years including unconditional image generation, class-to-image generation, text-to-image generation and image-to-image translation are highlighted, along with their applications in image editing. Next, a detailed summary of video generation and editing models, especially video diffusion models, is provided. And their advantages and disadvantages based on the requirements of training data are outlined. Additionally, this paper reviews the classic datasets for image and video generation and the commonly used evaluation metrics. Finally, the paper summarizes the challenges faced in visual generation in terms of data collection, inference efficiency, long video generation, controllable video generation and security, and discusses potential future research directions.