Learn and Tell: Learning Priors for Image Caption Generation

被引:1
|
作者
Liu, Pei [1 ,2 ,5 ]
Peng, Dezhong [1 ,3 ]
Zhang, Ming [4 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[3] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China
[4] Nanjing Univ Aeronaut & Astronaut, Coll Econ & Management, Nanjing 211106, Peoples R China
[5] Dept Elect & Comp Engn, 968 Ctr Dr, Gainesville, FL 32611 USA
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 19期
基金
中国国家自然科学基金;
关键词
image captioning; image understanding; probability-being-mentioned prior; part-of-speech prior; LANGUAGE;
D O I
10.3390/app10196942
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this work, we propose a novel priors-based attention neural network (PANN) for image captioning, which aims at incorporating two kinds of priors, i.e., the probabilities being mentioned for local region proposals (PBM priors) and part-of-speech clues for caption words (POS priors), into a visual information extraction process at each word prediction. This work was inspired by the intuitions that region proposals have different inherent probabilities for image captioning, and that the POS clues bridge the word class (part-of-speech tag) with the categories of visual features. We propose new methods to extract these two priors, in which the PBM priors are obtained by computing the similarities between the caption feature vector and local feature vectors, while the POS priors are predicated at each step of word generation by taking the hidden state of the decoder as input. After that, these two kinds of priors are further incorporated into the PANN module of the decoder to help the decoder extract more accurate visual information for the current word generation. In our experiments, we qualitatively analyzed the proposed approach and quantitatively evaluated several captioning schemes with our PANN on the MS-COCO dataset. Experimental results demonstrate that our proposed method could achieve better performance as well as the effectiveness of the proposed network for image captioning.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [31] The Accurate Guidance for Image Caption Generation
    Qi, Xinyuan
    Cao, Zhiguo
    Xiao, Yang
    Wang, Jian
    Zhang, Chao
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 15 - 26
  • [32] An Overview of Image Caption Generation Methods
    Wang, Haoran
    Zhang, Yue
    Yu, Xiaosheng
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
  • [33] A survey on automatic image caption generation
    Bai, Shuang
    An, Shan
    NEUROCOMPUTING, 2018, 311 : 291 - 304
  • [34] Image caption generation with dual attention mechanism
    Liu, Maofu
    Li, Lingjun
    Hu, Huijun
    Guan, Weili
    Tian, Jing
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [35] Image Caption Generation with Part of Speech Guidance
    He, Xinwei
    Shi, Baoguang
    Bai, Xiang
    Xia, Gui-Song
    Zhang, Zhaoxiang
    Dong, Weisheng
    PATTERN RECOGNITION LETTERS, 2019, 119 : 229 - 237
  • [36] Image Caption Generation Using A Deep Architecture
    Hani, Ansar
    Tagougui, Najiba
    Kherallah, Monji
    2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
  • [37] Cross-Lingual Image Caption Generation
    Miyazaki, Takashi
    Shimizu, Nobuyuki
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1780 - 1790
  • [38] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [39] Topic-Based Image Caption Generation
    Dash, Sandeep Kumar
    Acharya, Shantanu
    Pakray, Partha
    Das, Ranjita
    Gelbukh, Alexander
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 3025 - 3034
  • [40] Attention-Based Image Caption Generation
    Manasa, M.
    Sowmya, D.
    Reddy, Y. Supriya
    Sreedevi, Pogula
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023, 2025, 1273 : 364 - 369