Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s

被引:12
|
作者
Zhang, Huawei [1 ]
Ma, Chengbo [1 ]
Jiang, Zhanjun [1 ]
Lian, Jing [1 ]
机构
[1] Lanzhou Jiaotong Univ, Elect & Informat Engn, Lanzhou 730000, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Semantics; Visualization; Data mining; Decoding; Task analysis; Logic gates; Bi-LSTM; image caption generation; semantic fusion; semantic similarity;
D O I
10.1109/ACCESS.2022.3232508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The image caption generation algorithm necessitates the expression of image content using accurate natural language. Given the existing encoder-decoder algorithm structure, the decoder solely generates words one by one in a front-to-back order and is unable to analyze integral contextual information. This paper employs a Bi-LSTM (Bi-directional Long Short-Term Memory) structure, which not only draws on past information but also captures subsequent information, resulting in the prediction of image content subject to the context clues. The visual information is respectively fed into the F-LSTM decoder (forward LSTM decoder) and B-LSTM decoder (backward LSTM decoder) to extract semantic information, along with complementing semantic output. Specifically, the subsidiary attention mechanism S-Att acts between F-LSTM and B-LSTM, while the semantic information of B-LSTM and F-LSTM is extracted using the attention mechanism. Meanwhile, the semantic interaction is extracted pursuant to the similarity while aligning the hidden states, resulting in the output of the fused semantic information. We adopt a Bi-LSTM-s model capable of extracting contextual information and realizing finer-grained image captioning effectively. In the end, our model improved by 9.7% on the basis of the original LSTM. In addition, our model effectively solves the problem of inconsistent semantic information in the forward and backward direction of the simultaneous order, and gets a score of 37.5 on BLEU-4. The superiority of this approach is experimentally demonstrated on the MSCOCO dataset.
引用
收藏
页码:134 / 143
页数:10
相关论文
共 50 条
  • [21] Using contextual information for image retrieval
    Gregory, L
    Kittler, J
    11TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING, PROCEEDINGS, 2001, : 230 - 235
  • [22] A Hindi Image Caption Generation Framework Using Deep Learning
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [23] Automatic Image Caption Generation Using ResNet & Torch Vision
    Verma, Vijeta
    Saritha, Sri Khetwat
    Jain, Sweta
    MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT II, 2022, 1763 : 82 - 101
  • [24] Assamese news image caption generation using attention mechanism
    Das, Ringki
    Singh, Thoudam Doren
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (07) : 10051 - 10069
  • [25] Generation of a short narrative caption for an image using the suggested hashtag
    Gaur, Shivam
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2019), 2019, : 331 - 337
  • [26] Assamese news image caption generation using attention mechanism
    Ringki Das
    Thoudam Doren Singh
    Multimedia Tools and Applications, 2022, 81 : 10051 - 10069
  • [27] Combining semi-supervised model and optimized LSTM for image caption generation based on pseudo labels
    Roshni Padate
    Amit Jain
    Mukesh Kalla
    Arvind Sharma
    Multimedia Tools and Applications, 2024, 83 : 29997 - 30017
  • [28] Combining semi-supervised model and optimized LSTM for image caption generation based on pseudo labels
    Padate, Roshni
    Jain, Amit
    Kalla, Mukesh
    Sharma, Arvind
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 29997 - 30017
  • [29] Mind's Eye: A Recurrent Visual Representation for Image Caption Generation
    Chen, Xinlei
    Zitnick, C. Lawrence
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2422 - 2431
  • [30] Hybrid explainable image caption generation using image processing and natural language processing
    Mishra, Atul
    Agrawal, Anubhav
    Bhasker, Shailendra
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (10) : 4874 - 4884