Abstract:
With the rapid development of metaverse, digital twins, and intelligent interaction technologies, 3D virtual humans, as a medium bridging the physical world and digital space, face core challenges in high-fidelity reconstruction and natural driving technologies at the intersection of computer vision and artificial intelligence. This survey focuses on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), systematically reviewing the latest advances in 3D virtual human reconstruction and animation. The discussion follows the common workflow from data-driven reconstruction to content generation, while analyzing the field through three key dimensions: dynamic modeling, multimodal alignment, and the integration of physical constraints. In reconstruction, the paradigm innovations of NeRF and 3DGS are highlighted: NeRF-based methods achieve geometric consistency optimization for complex poses in single-view reconstruction through implicit continuous function modeling of dynamic radiance fields, while Gaussian Splatting significantly enhances non-rigid deformation modeling efficiency with explicit editability and real-time rendering capabilities. For driving technologies, the review categorizes critical advancements in audio and visual modalities, including lip synchronization, full facial expression generation, and multimodal emotion-driven methods, and dissects optimization pathways for temporal modeling networks and neural-physical hybrid architectures. Furthermore, it summarizes current challenges such as cross-modal temporal deviations and efficiency-fidelity trade-offs in dynamic modeling, while proposing future directions like physics-guided learning and lightweight architecture design. This work aims to provide comprehensive technical insights and theoretical foundations for constructing a virtual-real integrated digital ecosystem.