Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s

被引:12
|
作者
Zhang, Huawei [1 ]
Ma, Chengbo [1 ]
Jiang, Zhanjun [1 ]
Lian, Jing [1 ]
机构
[1] Lanzhou Jiaotong Univ, Elect & Informat Engn, Lanzhou 730000, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Semantics; Visualization; Data mining; Decoding; Task analysis; Logic gates; Bi-LSTM; image caption generation; semantic fusion; semantic similarity;
D O I
10.1109/ACCESS.2022.3232508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The image caption generation algorithm necessitates the expression of image content using accurate natural language. Given the existing encoder-decoder algorithm structure, the decoder solely generates words one by one in a front-to-back order and is unable to analyze integral contextual information. This paper employs a Bi-LSTM (Bi-directional Long Short-Term Memory) structure, which not only draws on past information but also captures subsequent information, resulting in the prediction of image content subject to the context clues. The visual information is respectively fed into the F-LSTM decoder (forward LSTM decoder) and B-LSTM decoder (backward LSTM decoder) to extract semantic information, along with complementing semantic output. Specifically, the subsidiary attention mechanism S-Att acts between F-LSTM and B-LSTM, while the semantic information of B-LSTM and F-LSTM is extracted using the attention mechanism. Meanwhile, the semantic interaction is extracted pursuant to the similarity while aligning the hidden states, resulting in the output of the fused semantic information. We adopt a Bi-LSTM-s model capable of extracting contextual information and realizing finer-grained image captioning effectively. In the end, our model improved by 9.7% on the basis of the original LSTM. In addition, our model effectively solves the problem of inconsistent semantic information in the forward and backward direction of the simultaneous order, and gets a score of 37.5 on BLEU-4. The superiority of this approach is experimentally demonstrated on the MSCOCO dataset.
引用
收藏
页码:134 / 143
页数:10
相关论文
共 50 条
  • [1] A PARALL-FUSION RNN-LSTM ARCHITECTURE FOR IMAGE CAPTION GENERATION
    Wang, Minsi
    Song, Li
    Yang, Xiaokang
    Luo, Chuanfei
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4448 - 4452
  • [2] Recurrent Attention LSTM Model for Image Chinese Caption Generation
    Zhang, Chaoying
    Dai, Yaping
    Cheng, Yanyan
    Jia, Zhiyang
    Hirota, Kaoru
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 808 - 813
  • [3] Image Caption Generation with Hierarchical Contextual Visual Spatial Attention
    Khademi, Mahmoud
    Schulte, Oliver
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2024 - 2032
  • [4] Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction
    Sasibhooshan, Reshmi
    Kumaraswamy, Suresh
    Sasidharan, Santhoshkumar
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [5] Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction
    Reshmi Sasibhooshan
    Suresh Kumaraswamy
    Santhoshkumar Sasidharan
    Journal of Big Data, 10
  • [6] Boosting image caption generation with feature fusion module
    Pengfei Xia
    Jingsong He
    Jin Yin
    Multimedia Tools and Applications, 2020, 79 : 24225 - 24239
  • [7] Boosting image caption generation with feature fusion module
    Xia, Pengfei
    He, Jingsong
    Yin, Jin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24225 - 24239
  • [8] A Semantic Driven CNN - LSTM Architecture for Personalised Image Caption Generation
    Ignatious, Abisha Anto L.
    Jeevitha, S.
    Madhurambigai, M.
    Hemalatha, M.
    2019 11TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC 2019), 2019, : 356 - 362
  • [9] Image Caption Generation with Local Semantic and Global Information
    Liu, Xing
    Liu, Weibin
    Xing, Weiwei
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 680 - 685
  • [10] Image Caption Generation Using Multi-Level Semantic Context Information
    Tian, Peng
    Mo, Hongwei
    Jiang, Laihao
    SYMMETRY-BASEL, 2021, 13 (07):