可变形特征图残差网络用于城市声音识别
Deformable Feature Map Residual Network for Urban Sound Recognition
-
摘要: 针对城市声音识别过程中时频图像特征提取较困难的问题,提出一种可变形特征图残差网络用于城市声音识别.首先设计可变形特征图残差模块,包括偏移层与卷积层.偏移层将输入特征图的像素点移位,移位后的特征图通过快捷连接与卷积层提取到的特征图叠加,使网络集中在感兴趣的特征图区域采样,并向下级网络传递移位后特征图信息;其次设计可变形卷积残差网络;最后将该网络提取的特征与城市声音的梅尔倒谱系数融合,经压缩激励模块重标定后输入全连接层分类.在城市声音数据集上进行了实验,结果表明,与卷积神经网络的方法相比,该方法用于城市声音识别准确率提高5%以上.Abstract: In view of the difficulty of time-frequency image feature extraction in urban sound recognition,a deformable feature map residual network was proposed for urban sound recognition.Firstly,a deformable feature map residual block was designed,which consists of offset layer and convolution.The offset layer shifted pixels of input feature maps,and the shifted feature maps were overlapped with the feature maps extracted from the convolution through shortcut,so that the network could focus on the interested area of feature maps for sampling,and transmit the information of the shifted feature maps to lower network.Secondly,a deformable convolution residual network was designed.Finally,the features extracted from the network were fused with Mel frequency cepstrum coefficients of urban sound.The fusion features input a full connection for classification after re-weighting by squeeze and excitation block.The corresponding experiments were carried out on Urban Sound Database.The results show that the accuracy of this method is more than 5%higher than the convolution neural network methods.