Deep Cross-Modal Synthesis of Environmental Sound
-
Graphical Abstract
-
Abstract
With the continuous development of computer graphics technology,users put forward higher requirements for accompanied sound of video and animation.Aiming at the problem that current methods usually are high complexity and poor scalability,this paper proposed a novel deep environment sound synthesis algorithm which is based on generative adversarial network and sample recurrent neural network.First,the deep features of the video are extracted based on the visual geometry group network model.Then,a novel synchronous sequential network model is proposed to realize the cross-modal feature transformation with higher synchronization rate from visual to audio.Finally,the generated sound is enhanced through the timbre enhancement network model for scalability improvement.Through training and testing 12 different types of video in the audio-video cross-modal data set,the subjective and objective evaluation of the results shows that the generated results are realistic and the proposed method is scalable.
-
-