Context Correlation Distillation for Lip Reading
-
Graphical Abstract
-
Abstract
A cross-modal knowledge distillation method C2KD(context correlation knowledge distillation)is proposed to address the problem that the performance of the lip reading model is limited by the size of the dataset.C2KD distills the multi-scale context correlation from the speech recognition model to the lip reading model.Firstly,the self-attention module of the Transformer model is used to obtain the context correlation knowledge.Secondly,a layer mapping strategy is used to decide which layers of the speech recognition model to extract knowledge from.Finally,an adaptive training process is used to dynamically transfer speech recognition model’s knowledge based on lip reading model’s performance.C2KD achieves comparable performance on LRS2 and LRS3 datasets,outperforming the baseline by a margin of 2.0%and 2.7%in word error rate,respectively.
-
-