A comprehensive construction of deep neural network-based encoder-decoder framework for automatic image captioning systems

被引:0
|
作者
Rahman, Md Mijanur [1 ]
Uzzaman, Ashik [1 ]
Sami, Sadia Islam [1 ]
Khatun, Fatema [2 ]
Bhuiyan, Md Al-Amin [3 ]
机构
[1] Jatiya Kabi Kazi Nazrul Islam Univ, Dept Comp Sci & Engn, Mymensingh 2224, Bangladesh
[2] Bangabandhu Sheikh Mujibur Rahman Sci & Technol Un, Dept Elect & Elect Engn, Gopalganj, Dhaka, Bangladesh
[3] King Faisal Univ, Dept Comp Engn, Al Hufuf, Al Ahsa, Saudi Arabia
关键词
CNN Encoder; Deep Learning; Image Captioning; Image Feature Extractor; LSTM Decoder; Pre-trained VGG-19 Model; GENERATION; DATASETS;
D O I
10.1049/ipr2.13287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study introduces a novel encoder-decoder framework based on deep neural networks and provides a thorough investigation into the field of automatic picture captioning systems. The suggested model uses a "long short-term memory" decoder for word prediction and sentence construction, and a "convolutional neural network" as an encoder that is skilled at object recognition and spatial information retention. The long short-term memory network functions as a sequence processor, generating a fixed-length output vector for final predictions, while the VGG-19 model is utilized as an image feature extractor. For both training and testing, the study uses a variety of photos from open-access datasets, such as Flickr8k, Flickr30k, and MS COCO. The Python platform is used for implementation, with Keras and TensorFlow as backends. The experimental findings, which were assessed using the "bilingual evaluation understudy" metric, demonstrate the effectiveness of the suggested methodology in automatically captioning images. By addressing spatial relationships in images and producing logical, contextually relevant captions, the paper advances image captioning technology. Insightful ideas for future study directions are generated by the discussion of the difficulties faced during the experimentation phase. By establishing a strong neural network architecture for automatic picture captioning, this study creates opportunities for future advancement and improvement in the area.
引用
收藏
页码:4778 / 4798
页数:21
相关论文
共 50 条
  • [1] Deep Hierarchical Encoder-Decoder Network for Image Captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2942 - 2956
  • [2] Parallel encoder-decoder framework for image captioning
    Saeidimesineh, Reyhane
    Adibi, Peyman
    Karshenas, Hossein
    Darvishy, Alireza
    KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [3] Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Sinha, Sushant
    Saha, Sriparna
    Bhattacharyya, Pushpak
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [4] An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Peethala, Mahesh Babu
    Saha, Sriparna
    Bhattacharyya, Pushpak
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 3019 - 3024
  • [5] CapNet: An Encoder-Decoder based Neural Network Model for Automatic Bangla Image Caption Generation
    Rahman, Rashik
    Saha, Aloke Kumar
    Murad, Hasan
    Al Masud, Shah Murtaza Rashid
    Rahman, Nakiba Nuren
    Momtaz, A. S. Zaforullah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 752 - 759
  • [6] A survey on deep neural network-based image captioning
    Liu, Xiaoxiao
    Xu, Qingyang
    Wang, Ning
    VISUAL COMPUTER, 2019, 35 (03): : 445 - 470
  • [7] A survey on deep neural network-based image captioning
    Xiaoxiao Liu
    Qingyang Xu
    Ning Wang
    The Visual Computer, 2019, 35 : 445 - 470
  • [8] Whole Image Synthesis Using a Deep Encoder-Decoder Network
    Sevetlidis, Vasileios
    Giuffrida, Mario Valerio
    Tsaftaris, Sotirios A.
    SIMULATION AND SYNTHESIS IN MEDICAL IMAGING, SASHIMI 2016, 2016, 9968 : 127 - 137
  • [9] Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning
    Ramos, Rita
    Martins, Bruno
    IEEE ACCESS, 2022, 10 : 24852 - 24863
  • [10] Encoder-Decoder based Neural Network for Perspective Estimation
    Wang, Yutong
    Zhang, Qi
    Kim, Joongkyu
    Li, Huifang
    IPMV 2021: PROCEEDINGS OF 2021 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MACHINE VISION (IPMV 2021), 2021, : 42 - 46