基于面部运动单元和时序注意力的视频表情识别方法

胡敏; 胡鹏远; 葛鹏; 王晓华; 章魁; 任福继

doi:10.3724/SP.J.1089.2023.19284

基于面部运动单元和时序注意力的视频表情识别方法

Video Expression Recognition Method Based on Facial Motion Unit and Temporal Attention

摘要

摘要: 针对视频序列中表情强度不一致,长短时记忆网络(LSTM)难以有效地提取其特征的问题,提出一种基于面部运动单元和时序注意力的视频表情识别方法.首先在卷积LSTM (ConvLSTM)的基础上引入时序注意力模块,对视频序列进行时序建模,在降低维度的同时保留丰富人脸图像特征信息;其次提出基于面部动作单元的人脸图像分割规则,解决面部表情活跃区域难以界定的问题;最后在模型中嵌入标签修正模块,解决自然条件下数据集中样本不确定性的问题.在MMI,Oulu-CASIA和AFEW数据集上的实验结果表明,所提方法的模型参数量低于已公开的主流模型,且在MMI数据集上的平均识别准确率达到87.22%,高于目前主流方法,在整体效果上优于目前具有代表性的方法.

Abstract: A video expression recognition method based on facial motion units and temporal attention is proposed to address the problem of inconsistent expression intensity in video sequences, which is difficult to extract features effectively by a long short-term memory network(LSTM). Firstly, we introduce a temporal attention module based on convolutional LSTM(ConvLSTM) to model the video sequences temporally,which can reduce the dimensionality while retaining the rich feature information of face images. Secondly,we propose a face image segmentation rule based on facial motion units to solve the problem that it is difficult to define the active regions of facial expressions. Finally, we embed a label correction module in the model to solve the problem of sample uncertainty in the data set under natural conditions. The experimental results on MMI, Oulu-CASIA and AFEW datasets show that the number of model parameters of this method is lower than that of the published mainstream models, and the average recognition accuracy on the MMI dataset is 87.22%, which is higher than that of the current mainstream methods, and the overall effect is better than that of the current representative methods.

HTML全文

参考文献(31)

施引文献

资源附件(0)