Learn and Tell: Learning Priors for Image Caption Generation

被引:1
|
作者
Liu, Pei [1 ,2 ,5 ]
Peng, Dezhong [1 ,3 ]
Zhang, Ming [4 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[3] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China
[4] Nanjing Univ Aeronaut & Astronaut, Coll Econ & Management, Nanjing 211106, Peoples R China
[5] Dept Elect & Comp Engn, 968 Ctr Dr, Gainesville, FL 32611 USA
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 19期
基金
中国国家自然科学基金;
关键词
image captioning; image understanding; probability-being-mentioned prior; part-of-speech prior; LANGUAGE;
D O I
10.3390/app10196942
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this work, we propose a novel priors-based attention neural network (PANN) for image captioning, which aims at incorporating two kinds of priors, i.e., the probabilities being mentioned for local region proposals (PBM priors) and part-of-speech clues for caption words (POS priors), into a visual information extraction process at each word prediction. This work was inspired by the intuitions that region proposals have different inherent probabilities for image captioning, and that the POS clues bridge the word class (part-of-speech tag) with the categories of visual features. We propose new methods to extract these two priors, in which the PBM priors are obtained by computing the similarities between the caption feature vector and local feature vectors, while the POS priors are predicated at each step of word generation by taking the hidden state of the decoder as input. After that, these two kinds of priors are further incorporated into the PANN module of the decoder to help the decoder extract more accurate visual information for the current word generation. In our experiments, we qualitatively analyzed the proposed approach and quantitatively evaluated several captioning schemes with our PANN on the MS-COCO dataset. Experimental results demonstrate that our proposed method could achieve better performance as well as the effectiveness of the proposed network for image captioning.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [21] Automatic Image Caption Generation Based on Some Machine Learning Algorithms
    Predic, Bratislav
    Manic, Dasa
    Saracevic, Muzafer
    Karabasevic, Darjan
    Stanujkic, Dragisa
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [22] Image caption generation using transformer learning methods: a case study on instagram image
    Dittakan, Kwankamon
    Prompitak, Kamontorn
    Thungklang, Phutphisit
    Wongwattanakit, Chatchawan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 46397 - 46417
  • [23] Deep learning for ultrasound image caption generation based on object detection
    Zeng X.
    Wen L.
    Liu B.
    Qi X.
    Neurocomputing, 2020, 392 : 132 - 141
  • [24] Automatic Image Caption Generation Based on Some Machine Learning Algorithms
    Predic, Bratislav
    Manic, Dasa
    Saracevic, Muzafer
    Karabasevic, Darjan
    Stanujkic, Dragisa
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [25] Remote sensing image caption generation via transformer and reinforcement learning
    Shen, Xiangqing
    Liu, Bing
    Zhou, Yong
    Zhao, Jiaqi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 26661 - 26682
  • [26] Image Caption Generation using Deep Learning For Video Summarization Applications
    Inayathulla, Mohammed
    Karthikeyan, C.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 565 - 572
  • [27] Data augmentation to stabilize image caption generation models in deep learning
    Aldabbas H.
    Asad M.
    Ryalat M.H.
    Malik K.R.
    Akbar Qureshi M.Z.
    International Journal of Advanced Computer Science and Applications, 2019, 10 (10): : 571 - 579
  • [28] Remote sensing image caption generation via transformer and reinforcement learning
    Xiangqing Shen
    Bing Liu
    Yong Zhou
    Jiaqi Zhao
    Multimedia Tools and Applications, 2020, 79 : 26661 - 26682
  • [29] Image caption generation using transformer learning methods: a case study on instagram image
    Kwankamon Dittakan
    Kamontorn Prompitak
    Phutphisit Thungklang
    Chatchawan Wongwattanakit
    Multimedia Tools and Applications, 2024, 83 : 46397 - 46417
  • [30] Image Caption Generation With Adaptive Transformer
    Zhang, Wei
    Nie, Wenbo
    Li, Xinle
    Yu, Yao
    2019 34RD YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2019, : 521 - 526