基于去噪扩散概率模型的随机人体动作预测
Stochastic Human Motion Prediction Based On Denoising Diffusion Probability Model
-
摘要: 人体动作预测指根据给定的历史动作序列预测未来的动作序列, 可以为智能监控、人机交互等应用提供预判依据, 然而动作的随机性和不确定性导致动作预测十分困难. 针对目前大多数动作预测方法存在多样性不足, 或者预测结果偏离动作合理区间的问题, 提出一种基于去噪扩散概率模型的随机人体动作预测方法. 首先构建时空Transformer去噪扩散预测网络, 其中, 空间Transformer模块用于编码关节嵌入, 捕捉单帧内3D关节之间的局部关系, 时间Transformer模块捕获跨帧的全局依赖性, 提高预测动作与历史动作序列一致性; 然后设计预测动作序列细化模块, 在离散余弦变换空间中引入GCN-AT残差模块对预测结果进行细化, 生成更自然流畅的动作序列, 提升预测的准确性, 解决预测动作卡顿及不连贯问题. 在基准数据集上的实验结果表明, 所提方法在准确性和保真度方面显著优于对比方法, 在FDE, MMADE指标中取得最优值.Abstract: Human motion prediction refers to predicting future motion sequences based on given historical motion sequences, which can provide pre-judgment basis for applications such as intelligent monitoring and human-computer interaction. However, the randomness and uncertainty of human motions make motion prediction extremely difficult. In response to the problems that most current motion prediction methods have insufficient diversity or that the prediction results deviate from the reasonable range of human motions, a random human motion prediction method based on denoising diffusion probabilistic model is proposed. Firstly, a spatio-temporal Transformer denoising diffusion prediction network is constructed, in which the spatial Transformer module is used to encode joint embeddings to capture the local relationships among 3D joints within a single frame, and the temporal Transformer module captures the global dependencies across frames to improve the consistency between the predicted motion and the historical motion sequences. Then, the prediction action sequence refinement module is designed, and the GCN-AT residual module is introduced into the discrete cosine transform space to refine the prediction results, generate a more natural and smooth action sequence, improve the accuracy of prediction, and solve the problem of stuttering and incoherence of prediction actions. The experimental results on the benchmark dataset indicate that the proposed method outperforms the comparison methods in terms of accuracy and fidelity, and achieves the optimal values in the FDE and MMADE indicators.