数字说话人视频生成综述

宋一飞; 张炜; 陈智能; 姜育刚

数字说话人视频生成综述

A Survey on Talking Head Generation

摘要

摘要: 近年来, 基于深度学习的生成技术显著推动了虚拟数字人技术的发展. 本文针对当前虚拟数字人研究中的热点问题—数字说话人视频生成进行综述, 其在电影配音, 动画制作, 虚拟助手等场景中具有重要的应用前景. 本文从数据集、关键技术、评估策略三个方面, 对当前数字说话人视频生成技术及研究现状做一个较系统的梳理与总结, 介绍了其生成过程中涉及的视觉生成, 图像识别, 语音识别, 跨模态分析等多项人工智能的关键技术机器发展演进过程. 从数据, 模型, 评估策略等方面指出该方向需要迫切解决的问题, 并通过这些问题对其未来的发展方向作了展望, 以期能对该领域的研究者有所帮助启发, 促进该方向的发展.

Abstract: In recent years, the advancement of virtual digital human technology has been significantly accelerated by deep learning-based generative techniques. This paper offers a comprehensive review of the current hot topic in virtual digital human research: talking head generation. It emphasizes the promising applications of this technology in domains s such as film dubbing, animation production, and virtual assistants. From the perspectives of dataset availability, key technologies, and evaluation strategies, this paper presents a systematic overview and summary of the current state of talking head generation technology and research. It introduces pivotal artificial intelligence technologies involved in the generation process, encompassing visual generation, image recognition, speech recognition, and cross-modal analysis, along with their progressive developments. The paper identifies pressing issues that requires attention in this field, such as data, models, and evaluation metrics, and offers a future outlook based on these challenges. Its objective is to provide insightful guidance and promote the advancement of this field for researchers in the domain.

HTML全文

参考文献(0)

施引文献

资源附件(0)