A comprehensive construction of deep neural network-based encoder-decoder framework for automatic image captioning systems

被引:0
|
作者
Rahman, Md Mijanur [1 ]
Uzzaman, Ashik [1 ]
Sami, Sadia Islam [1 ]
Khatun, Fatema [2 ]
Bhuiyan, Md Al-Amin [3 ]
机构
[1] Jatiya Kabi Kazi Nazrul Islam Univ, Dept Comp Sci & Engn, Mymensingh 2224, Bangladesh
[2] Bangabandhu Sheikh Mujibur Rahman Sci & Technol Un, Dept Elect & Elect Engn, Gopalganj, Dhaka, Bangladesh
[3] King Faisal Univ, Dept Comp Engn, Al Hufuf, Al Ahsa, Saudi Arabia
关键词
CNN Encoder; Deep Learning; Image Captioning; Image Feature Extractor; LSTM Decoder; Pre-trained VGG-19 Model; GENERATION; DATASETS;
D O I
10.1049/ipr2.13287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study introduces a novel encoder-decoder framework based on deep neural networks and provides a thorough investigation into the field of automatic picture captioning systems. The suggested model uses a "long short-term memory" decoder for word prediction and sentence construction, and a "convolutional neural network" as an encoder that is skilled at object recognition and spatial information retention. The long short-term memory network functions as a sequence processor, generating a fixed-length output vector for final predictions, while the VGG-19 model is utilized as an image feature extractor. For both training and testing, the study uses a variety of photos from open-access datasets, such as Flickr8k, Flickr30k, and MS COCO. The Python platform is used for implementation, with Keras and TensorFlow as backends. The experimental findings, which were assessed using the "bilingual evaluation understudy" metric, demonstrate the effectiveness of the suggested methodology in automatically captioning images. By addressing spatial relationships in images and producing logical, contextually relevant captions, the paper advances image captioning technology. Insightful ideas for future study directions are generated by the discussion of the difficulties faced during the experimentation phase. By establishing a strong neural network architecture for automatic picture captioning, this study creates opportunities for future advancement and improvement in the area.
引用
收藏
页码:4778 / 4798
页数:21
相关论文
共 50 条
  • [31] Blur kernel estimation method based on deep encoder-decoder network
    Yu X.-Y.
    Xie W.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2020, 37 (04): : 731 - 738
  • [32] A Method of CT Image Denoising Based on Residual Encoder-Decoder Network
    Liu, Yali
    JOURNAL OF HEALTHCARE ENGINEERING, 2021, 2021 : 2384493
  • [33] ATTENTION-BASED ENCODER-DECODER NETWORK FOR SINGLE IMAGE DEHAZING
    Gao, Shunan
    Zhu, Jinghua
    Xi, Heran
    2021 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2021,
  • [34] Controllable Image Caption Generation Based on Encoder-decoder for Power Construction Scene
    Yang R.
    Shao J.
    Luo Y.
    Bai W.
    Dianwang Jishu/Power System Technology, 2022, 46 (07): : 2572 - 2580
  • [35] Feature Extraction and Generation of Robot Writing Motion Using Encoder-Decoder Based Deep Neural Network
    Kamigaki, Masahiro
    Katsura, Seiichiro
    2020 IEEE 16TH INTERNATIONAL WORKSHOP ON ADVANCED MOTION CONTROL (AMC), 2020, : 121 - 126
  • [36] A Comparative Evaluation of Transformer-Based Vision Encoder-Decoder Models for Brazilian Portuguese Image Captioning
    Bromonschenkel, Gabriel
    Oliveira, Hilark
    Paixao, Thiago M.
    2024 37TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES, SIBGRAPI 2024, 2024, : 235 - 240
  • [37] Underwater image restoration using deep encoder-decoder network with symmetric skip connections
    Gangisetty, Shankar
    Rai, Raghu Raj
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (01) : 247 - 255
  • [38] Optimized encoder-decoder cascaded deep convolutional network for leaf disease image segmentation
    Femi, David
    Mukunthan, Manapakkam Anandan
    NETWORK-COMPUTATION IN NEURAL SYSTEMS, 2024,
  • [39] Audio denoising using Encoder-Decoder Deep Neural Network in the case of HF radio
    Cubrilovic, Sara
    Kuzmanovic, Zvezdana
    Kvascev, Goran
    2024 23RD INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA, INFOTEH, 2024,
  • [40] Wavelet-Based Deep Auto Encoder-Decoder (WDAED)-Based Image Compression
    Mishra, Dipti
    Singh, Satish Kumar
    Singh, Rajat Kumar
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (04) : 1452 - 1462