A comprehensive construction of deep neural network-based encoder-decoder framework for automatic image captioning systems

被引:0
|
作者
Rahman, Md Mijanur [1 ]
Uzzaman, Ashik [1 ]
Sami, Sadia Islam [1 ]
Khatun, Fatema [2 ]
Bhuiyan, Md Al-Amin [3 ]
机构
[1] Jatiya Kabi Kazi Nazrul Islam Univ, Dept Comp Sci & Engn, Mymensingh 2224, Bangladesh
[2] Bangabandhu Sheikh Mujibur Rahman Sci & Technol Un, Dept Elect & Elect Engn, Gopalganj, Dhaka, Bangladesh
[3] King Faisal Univ, Dept Comp Engn, Al Hufuf, Al Ahsa, Saudi Arabia
关键词
CNN Encoder; Deep Learning; Image Captioning; Image Feature Extractor; LSTM Decoder; Pre-trained VGG-19 Model; GENERATION; DATASETS;
D O I
10.1049/ipr2.13287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study introduces a novel encoder-decoder framework based on deep neural networks and provides a thorough investigation into the field of automatic picture captioning systems. The suggested model uses a "long short-term memory" decoder for word prediction and sentence construction, and a "convolutional neural network" as an encoder that is skilled at object recognition and spatial information retention. The long short-term memory network functions as a sequence processor, generating a fixed-length output vector for final predictions, while the VGG-19 model is utilized as an image feature extractor. For both training and testing, the study uses a variety of photos from open-access datasets, such as Flickr8k, Flickr30k, and MS COCO. The Python platform is used for implementation, with Keras and TensorFlow as backends. The experimental findings, which were assessed using the "bilingual evaluation understudy" metric, demonstrate the effectiveness of the suggested methodology in automatically captioning images. By addressing spatial relationships in images and producing logical, contextually relevant captions, the paper advances image captioning technology. Insightful ideas for future study directions are generated by the discussion of the difficulties faced during the experimentation phase. By establishing a strong neural network architecture for automatic picture captioning, this study creates opportunities for future advancement and improvement in the area.
引用
收藏
页码:4778 / 4798
页数:21
相关论文
共 50 条
  • [21] Iterative Deep Convolutional Encoder-Decoder Network for Medical Image Segmentation
    Kim, Jung Uk
    Kim, Hak Gu
    Ro, Yong Man
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 685 - 688
  • [22] Semantic Segmentation of Remote Sensing Image Based on Encoder-Decoder Convolutional Neural Network
    Zhang Zhehan
    Fang Wei
    Du Lili
    Qiao Yanli
    Zhang Dongying
    Ding Guoshen
    ACTA OPTICA SINICA, 2020, 40 (03)
  • [23] Skin lesion segmentation using an improved framework of encoder-decoder based convolutional neural network
    Kaur, Ranpreet
    GholamHosseini, Hamid
    Sinha, Roopak
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2022, 32 (04) : 1143 - 1158
  • [24] Encoder-decoder based convolutional neural networks for image forgery detection
    El Biach, Fatima Zahra
    Iala, Imad
    Laanaya, Hicham
    Minaoui, Khalid
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (16) : 22611 - 22628
  • [25] The local ternary pattern encoder-decoder neural network for dental image segmentation
    Salih, Omran
    Duffy, Kevin Jan
    IET IMAGE PROCESSING, 2022, 16 (06) : 1520 - 1530
  • [26] Time frequency masking based speech enhancement using deep encoder-decoder neural network
    Shi, Wenhua
    Zhang, Xiongwei
    Zou, Xia
    Sun, Meng
    Li, Li
    Shengxue Xuebao/Acta Acustica, 2020, 45 (03): : 299 - 307
  • [27] MVCT image enhancement using reference-based encoder-decoder convolutional neural network
    Jin, Shuang
    Xu, Xiaotong
    Su, Zhe
    Tang, Long
    Zheng, Mengxun
    Liang, Peiwen
    Zhang, Hua
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
  • [28] Encoder-decoder based convolutional neural networks for image forgery detection
    Fatima Zahra El Biach
    Imad Iala
    Hicham Laanaya
    Khalid Minaoui
    Multimedia Tools and Applications, 2022, 81 : 22611 - 22628
  • [29] Single image deraining via deep residual attention and encoder-decoder network
    Wei, Mingrun
    Wang, Hongjuan
    Cheng, Ru
    Yu, Yue
    Wang, Lukun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 3453 - 3467
  • [30] Deep Encoder-Decoder Network-Based Wildfire Segmentation Using Drone Images in Real-Time
    Muksimova, Shakhnoza
    Mardieva, Sevara
    Cho, Young-Im
    REMOTE SENSING, 2022, 14 (24)