BiGRU-RA Model for Image Chinese Captioning via Global and Local Features

Deng Zhenrong; Zhang Yonglin; Yang Rui; Lan Rushi; Huang Wenming; Luo Xiaonan

doi:10.3724/SP.J.1089.2021.18262

Deng Zhenrong, Zhang Yonglin, Yang Rui, Lan Rushi, Huang Wenming, Luo Xiaonan. BiGRU-RA Model for Image Chinese Captioning via Global and Local Features[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 49-58. DOI: 10.3724/SP.J.1089.2021.18262

Citation:

BiGRU-RA Model for Image Chinese Captioning via Global and Local Features

Graphical Abstract

Graphical Abstract

Abstract

Abstract

To address the problem of insufficient detailed semantic information in current global features-based image captioning models,an image Chinese captioning model combining global and local features is proposed.The proposed model adopts the encoder-decoder framework.In the coding stage,the residual networks(Res-Net)and Faster R-CNN are used to extract the global and local features of images respectively,improving the model҆s utilization of image features at different scales.A bi-directional gated recurrent unit(BiGRU)with embedded visual attention structure and residual connection structure is applied as the decoder(BiGRU with residual connection and attention,BiGRU-RA).The model can adaptively allocate image features and text weights,and improve the mapping relationship between image feature regions and context information.Additionally,the reinforcement learning-based policy gradient is added to improve the loss function of the model and optimize the evaluation criteria CIDEr directly.The training and experiments are conducted on the Chinese captioning dataset of AI challenger.The comparative results show that the proposed model obtained better scores and the generated caption are more accurate and detailed.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

BiGRU-RA Model for Image Chinese Captioning via Global and Local Features

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content