Advanced Search
Xie Zhifeng, Sun Luoyi, Sun Yuzhou, Yu Chunpeng, Ma Lizhuang. Sound Generation Method with Timing-Aligned Visual Feature Mapping[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1506-1514. DOI: 10.3724/SP.J.1089.2022.19725
Citation: Xie Zhifeng, Sun Luoyi, Sun Yuzhou, Yu Chunpeng, Ma Lizhuang. Sound Generation Method with Timing-Aligned Visual Feature Mapping[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1506-1514. DOI: 10.3724/SP.J.1089.2022.19725

Sound Generation Method with Timing-Aligned Visual Feature Mapping

  • In order to address the problems of existing methods,such as obvious noise,weak reality and asynchronous with video,we proposed a sound generation method based on timing-aligned visual feature mapping.Firstly,we designed a feature aggregation window based on temporal constraint,which extract integrated visual feature from the video sequence.Secondly,the integrated visual feature was transformed into multi-frequency audio feature by a spatio-temporal matching cross-modal mapping network.Finally,we utilized an audio decoder to obtain Mel-spectrogram from audio features,and send to a vocoder to output the final waveform.We completed qualitative and quantitative experiments on the VAS dataset,and the results show that the proposed method significantly improves audio quality,timing alignment,and audience perception.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return