Advanced Search
Zhou Zhiping, Zhang Wei. An Image Caption Generation Model Based on Visual Concept Attention and Residual Connection[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(8): 1536-1542. DOI: 10.3724/SP.J.1089.2018.16825
Citation: Zhou Zhiping, Zhang Wei. An Image Caption Generation Model Based on Visual Concept Attention and Residual Connection[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(8): 1536-1542. DOI: 10.3724/SP.J.1089.2018.16825

An Image Caption Generation Model Based on Visual Concept Attention and Residual Connection

  • Making the machine automatically describe images has been one of the long-term goals in thefield of computer vision. In order to improve the accuracy of image caption model, an image caption modelbased on the stacked Long Short-Term Memory network is proposed, which combines the adaptive attentionmechanism with residual connection. Firstly, the basic LSTM structure is improved according to the pointer-net network, the units which can record the image visual attribute information are increased. Then theadaptive attention mechanism based on image visual semantic attribute is designed by using the improvedLSTM network, the image region to be processed at the next time is automatically chosen based on thehidden layer of the model at the previous time. In addition, to obtain a closer mapping relationship betweenthe image and description statement, a two-layer LSTM network based on residual connection is constructed,and finally the proposed model can describe the image by combining the image visual features with semanticfeatures. The training and testing are conducted on the MSCOCO and Flickr30K image datasets, theexperimental results demonstrate that the proposed model shows superior performance by using differentevaluation methods.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return