基于神经辐射场与高斯溅射的3D虚拟人重建与驱动方法综述

王俊超; 张金山; 杨璐嘉; 尹建伟

doi:10.3724/SP.J.1089.2025-00267

基于神经辐射场与高斯溅射的3D虚拟人重建与驱动方法综述

A Survey on 3D Virtual Human Reconstruction and Animation Based on Neural Radiance Fields and Gaussian Splatting

摘要

摘要: 随着元宇宙、数字孪生与智能交互方法的快速发展，3D虚拟人作为连接物理世界与数字空间的媒介，其高保真重建与自然驱动方法成为计算机视觉与人工智能交叉领域的核心挑战。文中聚焦神经辐射场（NeRF）与3D高斯溅射（3DGS）方法，系统地梳理了3D虚拟人重建与驱动方法的前沿进展，在形式上遵循从数据驱动重建到内容生成的常见流程，同时以动态建模、多模态对齐和物理约束融合三大维度为线索展开分析。在重建方法层面，重点探讨NeRF与3DGS的范式创新：基于NeRF的方法通过隐式连续函数建模动态辐射场，在单视角重建中实现复杂姿态的几何一致性优化；3DGS方法则凭借显式可编辑性与实时渲染能力，显著提升非刚性形变建模效率。在驱动方法层面，围绕音频与视频两大模态，分类阐述唇部同步、全脸表情生成、多模态情感驱动等关键方向，解析了时序建模网络、物理引导的神经渲染架构的优化路径。最后归纳该方法当前面临的跨模态时序偏差、动态建模效率-保真度矛盾等挑战，并展望物理引导学习、轻量化架构设计等未来发展方向，旨在为虚实融合的数字生态构建提供方法参考与理论支撑。

Abstract: With the rapid development of metaverse, digital twins, and intelligent interaction technologies, 3D virtual humans, as a medium bridging the physical world and digital space, face core challenges in high-fidelity reconstruction and natural driving technologies at the intersection of computer vision and artificial intelligence. This survey focuses on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), systematically reviewing the latest advances in 3D virtual human reconstruction and animation. The discussion follows the common workflow from data-driven reconstruction to content generation, while analyzing the field through three key dimensions: dynamic modeling, multimodal alignment, and the integration of physical constraints. In reconstruction, the paradigm innovations of NeRF and 3DGS are highlighted: NeRF-based methods achieve geometric consistency optimization for complex poses in single-view reconstruction through implicit continuous function modeling of dynamic radiance fields, while Gaussian Splatting significantly enhances non-rigid deformation modeling efficiency with explicit editability and real-time rendering capabilities. For driving technologies, the review categorizes critical advancements in audio and visual modalities, including lip synchronization, full facial expression generation, and multimodal emotion-driven methods, and dissects optimization pathways for temporal modeling networks and neural-physical hybrid architectures. Furthermore, it summarizes current challenges such as cross-modal temporal deviations and efficiency-fidelity trade-offs in dynamic modeling, while proposing future directions like physics-guided learning and lightweight architecture design. This work aims to provide comprehensive technical insights and theoretical foundations for constructing a virtual-real integrated digital ecosystem.

HTML全文

参考文献(0)

施引文献

资源附件(0)