Sound Generation Method with Timing-Aligned Visual Feature Mapping

Xie Zhifeng; Sun Luoyi; Sun Yuzhou; Yu Chunpeng; Ma Lizhuang

doi:10.3724/SP.J.1089.2022.19725

Xie Zhifeng, Sun Luoyi, Sun Yuzhou, Yu Chunpeng, Ma Lizhuang. Sound Generation Method with Timing-Aligned Visual Feature Mapping[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1506-1514. DOI: 10.3724/SP.J.1089.2022.19725

Citation:

Sound Generation Method with Timing-Aligned Visual Feature Mapping

Graphical Abstract

Graphical Abstract

Abstract

Abstract

In order to address the problems of existing methods,such as obvious noise,weak reality and asynchronous with video,we proposed a sound generation method based on timing-aligned visual feature mapping.Firstly,we designed a feature aggregation window based on temporal constraint,which extract integrated visual feature from the video sequence.Secondly,the integrated visual feature was transformed into multi-frequency audio feature by a spatio-temporal matching cross-modal mapping network.Finally,we utilized an audio decoder to obtain Mel-spectrogram from audio features,and send to a vocoder to output the final waveform.We completed qualitative and quantitative experiments on the VAS dataset,and the results show that the proposed method significantly improves audio quality,timing alignment,and audience perception.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

Sound Generation Method with Timing-Aligned Visual Feature Mapping

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content