Abstract:
The demand for high-quality dance motion generation is widespread in the film and animation industries. However, existing methods fail to adapt to varying skeletal proportions and geometric shapes of different virtual characters, and lack the flexibility to control the dance style. To address these issues, we propose a method for shape-aware music-driven stylized dance motion generation. First, we propose DanceDB++, a dance motion dataset of virtual characters with diverse skeletal proportions and geometric shapes based on a motion retargeting method that considers the body shape of character. Furthermore, we propose a shape-aware, music-driven dance motion generation diffusion model, which leverages a cross-attention mechanism to fuse the music features and the body shape parameters as the condition for the diffusion model, and introduce auxiliary training objectives based on skeletal proportions. Additionally, we introduce a semantic style guidance module that extracts style representations from text or example motions and injects them into the diffusion model through adaptive instance normalization layers, enabling flexible control over the generated dance style. The quantitative experimental results on DanceDB++ indicate that the proposed method outperforms the compared methods in metrics like physical foot contact score. In qualitative experiments, a series of stylized dance motions controlled by text descriptions or example motions show that the proposed method can effectively achieve stylized guidance in dance motion generation. The dataset and code based on Jittor framework will be publicly released.