Deep Cross-Modal Synthesis of Environmental Sound

Cheng Haonan; Li Sijia; Liu Shiguang

doi:10.3724/SP.J.1089.2019.17906

Cheng Haonan, Li Sijia, Liu Shiguang. Deep Cross-Modal Synthesis of Environmental Sound[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(12): 2047-2055. DOI: 10.3724/SP.J.1089.2019.17906

Citation:

Deep Cross-Modal Synthesis of Environmental Sound

Graphical Abstract

Graphical Abstract

Abstract

Abstract

With the continuous development of computer graphics technology,users put forward higher requirements for accompanied sound of video and animation.Aiming at the problem that current methods usually are high complexity and poor scalability,this paper proposed a novel deep environment sound synthesis algorithm which is based on generative adversarial network and sample recurrent neural network.First,the deep features of the video are extracted based on the visual geometry group network model.Then,a novel synchronous sequential network model is proposed to realize the cross-modal feature transformation with higher synchronization rate from visual to audio.Finally,the generated sound is enhanced through the timbre enhancement network model for scalability improvement.Through training and testing 12 different types of video in the audio-video cross-modal data set,the subjective and objective evaluation of the results shows that the generated results are realistic and the proposed method is scalable.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

Deep Cross-Modal Synthesis of Environmental Sound

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content