A survey on automatic image caption generation

被引:120
|
作者
Bai, Shuang [1 ]
An, Shan [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Elect & Informat Engn, 3 Shang Yuan Cun, Beijing, Peoples R China
[2] Beijing Jingdong Shangke Informat Technol Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Sentence template; Deep neural networks; Multimodal embedding; Encoder-decoder framework; Attention mechanism; NEURAL-NETWORKS; DEEP; REPRESENTATION; SCENE;
D O I
10.1016/j.neucom.2018.05.080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Connecting both research communities of computer vision and natural language processing, image captioning is a quite challenging task. Various approaches have been proposed to solve this problem. In this paper, we present a survey on advances in image captioning research. Based on the technique adopted, we classify image captioning approaches into different categories. Representative methods in each category are summarized, and their strengths and limitations are talked about. In this paper, we first discuss methods used in early work which are mainly retrieval and template based. Then, we focus our main attention on neural network based methods, which give state of the art results. Neural network based methods are further divided into subcategories based on the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:291 / 304
页数:14
相关论文
共 50 条
  • [41] 3G structure for image caption generation
    Yuan, Aihong
    Li, Xuelong
    Lu, Xiaoqiang
    NEUROCOMPUTING, 2019, 330 : 17 - 28
  • [42] Enhancing Efficiency and Quality of Image Caption Generation with CARU
    Huang, Xuefei
    Ke, Wei
    Sheng, Hao
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2022), PT II, 2022, 13472 : 450 - 459
  • [43] Neural Image Caption Generation with Weighted Training and Reference
    Guiguang Ding
    Minghai Chen
    Sicheng Zhao
    Hui Chen
    Jungong Han
    Qiang Liu
    Cognitive Computation, 2019, 11 : 763 - 777
  • [44] Fine-grained attention for image caption generation
    Chang, Yan-Shuo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 2959 - 2971
  • [45] Neural Image Caption Generation with Weighted Training and Reference
    Ding, Guiguang
    Chen, Minghai
    Zhao, Sicheng
    Chen, Hui
    Han, Jungong
    Liu, Qiang
    COGNITIVE COMPUTATION, 2019, 11 (06) : 763 - 777
  • [46] Content moderation assistance through image caption generation
    Kearns, Liam
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2025, 25
  • [47] Boosting image caption generation with feature fusion module
    Pengfei Xia
    Jingsong He
    Jin Yin
    Multimedia Tools and Applications, 2020, 79 : 24225 - 24239
  • [48] Interpretable Image Caption Generation Based on Dependency Syntax
    Liu M.
    Bi J.
    Zhou B.
    Hu H.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 2115 - 2126
  • [49] Modeling coverage with semantic embedding for image caption generation
    Jiang, Teng
    Zhang, Zehan
    Yang, Yupu
    VISUAL COMPUTER, 2019, 35 (11): : 1655 - 1665
  • [50] Comparative Evaluation of CNN Architectures for Image Caption Generation
    Katiyar, Sulabh
    Borgohain, Samir Kumar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (12) : 793 - 801