Image Captioning Based on Conditional Generative Adversarial Nets
-
Graphical Abstract
-
Abstract
Generating the description of the semantic content of the image is defined as image captioning.It is an emerged research area in computer vision.Although the gradient disappearance and gradient explosion problems are solved by using convolutional neural networks(CNN)with long-short-term-memory(LSTM),the LSTM based model relies on serialized generation descriptions,which cannot be processed parallel during training,and it is easy to forget the previous information when generating the caption.In order to solve this problem,CNN-based generation model is used to generate image captions with the help of conditional generative adversarial training(CGAN).CGAN is used to generate caption,and the performance can be also improved by combining attention mechanism.The experiments are conducted on MSCOCO image datasets and the results show that our method could achieves 2%improvement on CIDEr that indicates semantic richness and 1%improvement on BLEU that indicates the accuracy compared with other CNN-based methods.And the proposed method can outperform LSTM based image captioning methods in some evaluation metrics.We could conclude that the image captions generated by our method are closer to the picture description,and the semantic content is richer.
-
-