Deformable Feature Map Residual Network for Urban Sound Recognition
-
Graphical Abstract
-
Abstract
In view of the difficulty of time-frequency image feature extraction in urban sound recognition,a deformable feature map residual network was proposed for urban sound recognition.Firstly,a deformable feature map residual block was designed,which consists of offset layer and convolution.The offset layer shifted pixels of input feature maps,and the shifted feature maps were overlapped with the feature maps extracted from the convolution through shortcut,so that the network could focus on the interested area of feature maps for sampling,and transmit the information of the shifted feature maps to lower network.Secondly,a deformable convolution residual network was designed.Finally,the features extracted from the network were fused with Mel frequency cepstrum coefficients of urban sound.The fusion features input a full connection for classification after re-weighting by squeeze and excitation block.The corresponding experiments were carried out on Urban Sound Database.The results show that the accuracy of this method is more than 5%higher than the convolution neural network methods.
-
-