Learn and Tell: Learning Priors for Image Caption Generation

被引:1
|
作者
Liu, Pei [1 ,2 ,5 ]
Peng, Dezhong [1 ,3 ]
Zhang, Ming [4 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[3] Shenzhen Peng Cheng Lab, Shenzhen 518052, Peoples R China
[4] Nanjing Univ Aeronaut & Astronaut, Coll Econ & Management, Nanjing 211106, Peoples R China
[5] Dept Elect & Comp Engn, 968 Ctr Dr, Gainesville, FL 32611 USA
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 19期
基金
中国国家自然科学基金;
关键词
image captioning; image understanding; probability-being-mentioned prior; part-of-speech prior; LANGUAGE;
D O I
10.3390/app10196942
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this work, we propose a novel priors-based attention neural network (PANN) for image captioning, which aims at incorporating two kinds of priors, i.e., the probabilities being mentioned for local region proposals (PBM priors) and part-of-speech clues for caption words (POS priors), into a visual information extraction process at each word prediction. This work was inspired by the intuitions that region proposals have different inherent probabilities for image captioning, and that the POS clues bridge the word class (part-of-speech tag) with the categories of visual features. We propose new methods to extract these two priors, in which the PBM priors are obtained by computing the similarities between the caption feature vector and local feature vectors, while the POS priors are predicated at each step of word generation by taking the hidden state of the decoder as input. After that, these two kinds of priors are further incorporated into the PANN module of the decoder to help the decoder extract more accurate visual information for the current word generation. In our experiments, we qualitatively analyzed the proposed approach and quantitatively evaluated several captioning schemes with our PANN on the MS-COCO dataset. Experimental results demonstrate that our proposed method could achieve better performance as well as the effectiveness of the proposed network for image captioning.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [41] Entity-aware Image Caption Generation
    Lu, Di
    Whitehead, Spencer
    Huang, Lifu
    Ji, Heng
    Chang, Shih-Fu
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4013 - 4023
  • [42] Topic-Based Image Caption Generation
    Sandeep Kumar Dash
    Shantanu Acharya
    Partha Pakray
    Ranjita Das
    Alexander Gelbukh
    Arabian Journal for Science and Engineering, 2020, 45 : 3025 - 3034
  • [43] Topic-Specific Image Caption Generation
    Zhou, Chang
    Mao, Yuzhao
    Wang, Xiaojie
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 321 - 332
  • [44] Image caption generation with high-level image features
    Ding, Songtao
    Qu, Shiru
    Xi, Yuling
    Sangaiah, Arun Kumar
    Wan, Shaohua
    PATTERN RECOGNITION LETTERS, 2019, 123 : 89 - 95
  • [45] Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap
    Amirian, Soheyla
    Rasheed, Khaled
    Taha, Thiab R.
    Arabnia, Hamid R.
    IEEE ACCESS, 2020, 8 (08): : 218386 - 218400
  • [46] Chinese Image Caption Based on Deep Learning
    Luo, Ziyue
    Kang, Huixian
    Yao, Pin
    Wan, Wanggen
    2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 216 - 220
  • [47] AUDIO CAPTION: LISTEN AND TELL
    Wu, Mengyue
    Dinkel, Heinrich
    Yu, Kai
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 830 - 834
  • [48] Transformer based image caption generation for news articles
    Pande, Ashtavinayak
    Pandey, Atul
    Solanki, Ayush
    Shanbhag, Chinmay
    Motghare, Manish
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01):
  • [49] Bahdanau Attention Based Bengali Image Caption Generation
    Alam, Md Sahrial
    Rahman, Md Sayedur
    Hosen, Md Ikbal
    Mubin, Khairul Anam
    Hossen, Sharif
    Mridha, M. F.
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1073 - 1077
  • [50] Image Caption Generation with Local Semantic and Global Information
    Liu, Xing
    Liu, Weibin
    Xing, Weiwei
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 680 - 685