Advanced Search
Cheng Haonan, Li Sijia, Liu Shiguang. Deep Cross-Modal Synthesis of Environmental Sound[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(12): 2047-2055. DOI: 10.3724/SP.J.1089.2019.17906
Citation: Cheng Haonan, Li Sijia, Liu Shiguang. Deep Cross-Modal Synthesis of Environmental Sound[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(12): 2047-2055. DOI: 10.3724/SP.J.1089.2019.17906

Deep Cross-Modal Synthesis of Environmental Sound

  • With the continuous development of computer graphics technology,users put forward higher requirements for accompanied sound of video and animation.Aiming at the problem that current methods usually are high complexity and poor scalability,this paper proposed a novel deep environment sound synthesis algorithm which is based on generative adversarial network and sample recurrent neural network.First,the deep features of the video are extracted based on the visual geometry group network model.Then,a novel synchronous sequential network model is proposed to realize the cross-modal feature transformation with higher synchronization rate from visual to audio.Finally,the generated sound is enhanced through the timbre enhancement network model for scalability improvement.Through training and testing 12 different types of video in the audio-video cross-modal data set,the subjective and objective evaluation of the results shows that the generated results are realistic and the proposed method is scalable.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return