Advanced Search
Deng Zhenrong, Zhang Yonglin, Yang Rui, Lan Rushi, Huang Wenming, Luo Xiaonan. BiGRU-RA Model for Image Chinese Captioning via Global and Local Features[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 49-58. DOI: 10.3724/SP.J.1089.2021.18262
Citation: Deng Zhenrong, Zhang Yonglin, Yang Rui, Lan Rushi, Huang Wenming, Luo Xiaonan. BiGRU-RA Model for Image Chinese Captioning via Global and Local Features[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(1): 49-58. DOI: 10.3724/SP.J.1089.2021.18262

BiGRU-RA Model for Image Chinese Captioning via Global and Local Features

  • To address the problem of insufficient detailed semantic information in current global features-based image captioning models,an image Chinese captioning model combining global and local features is proposed.The proposed model adopts the encoder-decoder framework.In the coding stage,the residual networks(Res-Net)and Faster R-CNN are used to extract the global and local features of images respectively,improving the model҆s utilization of image features at different scales.A bi-directional gated recurrent unit(BiGRU)with embedded visual attention structure and residual connection structure is applied as the decoder(BiGRU with residual connection and attention,BiGRU-RA).The model can adaptively allocate image features and text weights,and improve the mapping relationship between image feature regions and context information.Additionally,the reinforcement learning-based policy gradient is added to improve the loss function of the model and optimize the evaluation criteria CIDEr directly.The training and experiments are conducted on the Chinese captioning dataset of AI challenger.The comparative results show that the proposed model obtained better scores and the generated caption are more accurate and detailed.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return