Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s

被引:12
|
作者
Zhang, Huawei [1 ]
Ma, Chengbo [1 ]
Jiang, Zhanjun [1 ]
Lian, Jing [1 ]
机构
[1] Lanzhou Jiaotong Univ, Elect & Informat Engn, Lanzhou 730000, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Semantics; Visualization; Data mining; Decoding; Task analysis; Logic gates; Bi-LSTM; image caption generation; semantic fusion; semantic similarity;
D O I
10.1109/ACCESS.2022.3232508
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The image caption generation algorithm necessitates the expression of image content using accurate natural language. Given the existing encoder-decoder algorithm structure, the decoder solely generates words one by one in a front-to-back order and is unable to analyze integral contextual information. This paper employs a Bi-LSTM (Bi-directional Long Short-Term Memory) structure, which not only draws on past information but also captures subsequent information, resulting in the prediction of image content subject to the context clues. The visual information is respectively fed into the F-LSTM decoder (forward LSTM decoder) and B-LSTM decoder (backward LSTM decoder) to extract semantic information, along with complementing semantic output. Specifically, the subsidiary attention mechanism S-Att acts between F-LSTM and B-LSTM, while the semantic information of B-LSTM and F-LSTM is extracted using the attention mechanism. Meanwhile, the semantic interaction is extracted pursuant to the similarity while aligning the hidden states, resulting in the output of the fused semantic information. We adopt a Bi-LSTM-s model capable of extracting contextual information and realizing finer-grained image captioning effectively. In the end, our model improved by 9.7% on the basis of the original LSTM. In addition, our model effectively solves the problem of inconsistent semantic information in the forward and backward direction of the simultaneous order, and gets a score of 37.5 on BLEU-4. The superiority of this approach is experimentally demonstrated on the MSCOCO dataset.
引用
收藏
页码:134 / 143
页数:10
相关论文
共 50 条
  • [41] Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation
    Chaudhari, Chaitrali Prasanna
    Devane, Satish
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2022, 22 (02)
  • [42] Spatial Relational Attention Using Fully Convolutional Networks for Image Caption Generation
    Jiang, Teng
    Gong, Liang
    Yang, Yupu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2020, 19 (02)
  • [43] Image generation from text with entity information fusion
    Zhou, Deyu
    Sun, Kai
    Hu, Mingqi
    He, Yulan
    KNOWLEDGE-BASED SYSTEMS, 2021, 227
  • [44] Parallel-fusion LSTM with synchronous semantic and visual information for image captioning
    Zhang, Jing
    Li, Kangkang
    Wang, Zhe
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 75 (75)
  • [45] Bi-directional lstm network speech-to-gesture generation using bi-directional lstm network
    Kaneko N.
    Takeuchi K.
    Hasegawa D.
    Shirakawa S.
    Sakuta H.
    Sumi K.
    Transactions of the Japanese Society for Artificial Intelligence, 2019, 34 (06):
  • [46] Multimodal Multi-Level Fusion using Contextual Information
    Vybornova, Olga
    Gemo, Monica
    Macq, Benoit
    ERCIM NEWS, 2007, (70): : 61 - 62
  • [47] Traffic Sign Detection using Feature Fusion and Contextual Information
    Wang, Haitao
    Chen, Guang
    Li, Zhijun
    Liu, Zhengfa
    2021 6TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2021), 2021, : 949 - 953
  • [48] LEARNED IMAGE COMPRESSION WITH MULTI-SCALE SPATIAL AND CONTEXTUAL INFORMATION FUSION
    Liu, Ziyi
    Wang, Hanli
    Su, Taiyi
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 706 - 710
  • [49] Facial Image Completion Using Bi-Directional Pixel LSTM
    Yu, Xiulan
    He, Jiahao
    Zhang, Zufan
    IEEE ACCESS, 2020, 8 : 48642 - 48651
  • [50] Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit
    Faruk, Al Momin
    Al Faraby, Hasan
    Azad, Md Muzahidul
    Fedous, Md Riduyan
    Morol, Md Kishor
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,