基于特征点与多网络联合训练的表情识别

夏添; 张毅锋; 刘袁

doi:10.3724/SP.J.1089.2019.17342

基于特征点与多网络联合训练的表情识别

Landmark-Based Facial Expression Recognition by Joint Training of Multiple Networks

摘要

摘要: 由于表情图片序列比单张表情图片的信息更丰富,因此基于前者的表情识别容易取得更好的实验效果.针对表情图片序列,提出一种仅基于人脸特征点信息和联合训练2个深度神经网络进行表情识别的方法.首先基于长度不定的图片序列抽取各帧之间差异最大化的子集;其次提取该子集中所有图片的特征点坐标进行预处理;再将坐标分别输入微观深度网络(MIC-NN)与宏观深度网络(MAC-NN)进行独立训练;最后基于惩罚MIC-NN与MAC-NN间差异的损失函数联合训练二者后,使用融合网络(FUS-NN)作为最终预测模型.在CK+,Oulu-CASIA,MMI这3个数据集中的实验结果表明,FUS-NN取得了优于绝大部分已知方法 1%～15%的识别率,仅在MMI数据集中落后于最优模型2%;相比之下,该网络的时间复杂度远远小于效果相近的模型,取得了更好的识别效果与计算资源的平衡.

Abstract: Information contained in expression image sequence is more abundant than single expression image, thus expression recognition based on the former is easier to achieve better results. An expression recognition method based on facial landmark information and joint training of two deep neural networks is presented in this paper. Firstly, fixed number of frames which maximize the distance among them were extracted from variable length image sequence. Then, coordinates of landmarks were extracted for preprocessing. Next, microcosmic deep network(MIC-NN) and macroscopic deep network(MAC-NN) were trained independently using landmark information. Finally, a loss function which punish the differences between MIC-NN and MAC-NN was applied for joint training of them, and their fusion network(FUS-NN) was tested as final prediction model. Experiments on CK+, Oulu-CASIA and MMI database indicate the recognition rate of FUS-NN surpass most of known methods by 1%-15%, only lags behind the optimal model by 2%in MMI database. However, the time complexity of FUS-NN is sharply reduced compared to those models with similar performance, achieving better balance between recognition rate and computing resources.

HTML全文

参考文献(0)

施引文献

资源附件(0)