Abstract:
To address the poor generalization of supervised style transfer methods to unseen content, and the style inconsistency and unnaturalness of the motion generated by unsupervised methods, a semi-supervised motion style transfer method is proposed. The method employs two encoders to extract content and style features from content and style motions, respectively, and utilizes a decoder to fuse features and generate motions. To enhance generalization, a large-scale unlabeled motion dataset is used to augment the content motion dataset, and content preservation loss along with end-effector loss is applied to ensure the generated result resembles the input content motion. To maintain style consistency, the generated result is re-input into the style encoder to extract style features, and a triplet loss is calculated between the original style, the generated style, and another different style feature. Compared with methods such as Motion Puzzle on two public datasets, BFA and CMU, the proposed method significantly improves the MID values and user study results.