<div>Global-Local Combined Semantic Generation Network for Video Captioning</div>

Global-Local Combined Semantic Generation Network for Video Captioning[J]. Journal of Computer-Aided Design & Computer Graphics.

Citation:

Global-Local Combined Semantic Generation Network for Video Captioning[J]. Journal of Computer-Aided Design & Computer Graphics.

Citation:

Global-Local Combined Semantic Generation Network for Video Captioning[J]. Journal of Computer-Aided Design & Computer Graphics.

Global-Local Combined Semantic Generation Network for Video Captioning

Abstract

Aiming at the problem that the semantic features in video captioning cannot take into account the global general information and local detail information, which affects the video captioning effect, a global-local combined semantic generation network in video captioning is proposed (GLS-Net). Based on the complementarity of global and local information, the global and local semantic extraction units are designed, and the two units innovatively adopt a residual multi-layer perceptron (r-MLP) structure to enhance the feature processing effect. The algorithm combines general global semantics and detailed local semantics to strengthen the expression ability of semantic features. The features obtained are used as video content coding to improve the video captioning performance. On MSR-VTT and MSVD datasets, simulations are carried out based on semantics-assisted video captioning network (SAVC). Experimental results show that GLS-Net is superior to existing similar algorithms. Compared with SAVC network, the accuracy CIDEr is increased by

6.2% on average.

FullText(HTML)

Turn off MathJax

Article Contents