A comprehensive construction of deep neural network-based encoder-decoder framework for automatic image captioning systems

被引:0
|
作者
Rahman, Md Mijanur [1 ]
Uzzaman, Ashik [1 ]
Sami, Sadia Islam [1 ]
Khatun, Fatema [2 ]
Bhuiyan, Md Al-Amin [3 ]
机构
[1] Jatiya Kabi Kazi Nazrul Islam Univ, Dept Comp Sci & Engn, Mymensingh 2224, Bangladesh
[2] Bangabandhu Sheikh Mujibur Rahman Sci & Technol Un, Dept Elect & Elect Engn, Gopalganj, Dhaka, Bangladesh
[3] King Faisal Univ, Dept Comp Engn, Al Hufuf, Al Ahsa, Saudi Arabia
关键词
CNN Encoder; Deep Learning; Image Captioning; Image Feature Extractor; LSTM Decoder; Pre-trained VGG-19 Model; GENERATION; DATASETS;
D O I
10.1049/ipr2.13287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study introduces a novel encoder-decoder framework based on deep neural networks and provides a thorough investigation into the field of automatic picture captioning systems. The suggested model uses a "long short-term memory" decoder for word prediction and sentence construction, and a "convolutional neural network" as an encoder that is skilled at object recognition and spatial information retention. The long short-term memory network functions as a sequence processor, generating a fixed-length output vector for final predictions, while the VGG-19 model is utilized as an image feature extractor. For both training and testing, the study uses a variety of photos from open-access datasets, such as Flickr8k, Flickr30k, and MS COCO. The Python platform is used for implementation, with Keras and TensorFlow as backends. The experimental findings, which were assessed using the "bilingual evaluation understudy" metric, demonstrate the effectiveness of the suggested methodology in automatically captioning images. By addressing spatial relationships in images and producing logical, contextually relevant captions, the paper advances image captioning technology. Insightful ideas for future study directions are generated by the discussion of the difficulties faced during the experimentation phase. By establishing a strong neural network architecture for automatic picture captioning, this study creates opportunities for future advancement and improvement in the area.
引用
收藏
页码:4778 / 4798
页数:21
相关论文
共 50 条
  • [41] Manipulating Retinal OCT data for Image Segmentation based on Encoder-Decoder Network
    Song, Mingue
    Kim, Yanggon
    PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [42] Robust encoder-decoder learning framework for offline handwritten mathematical expression recognition based on a multi-scale deep neural network
    Shan, Guangcun
    Wang, Hongyu
    Liang, Wei
    Chen, Kai
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (03)
  • [43] Robust encoder-decoder learning framework for offline handwritten mathematical expression recognition based on a multi-scale deep neural network
    Guangcun Shan
    Hongyu Wang
    Wei Liang
    Kai Chen
    Science China Information Sciences, 2021, 64
  • [44] Service Function Migration Scheduling based on Encoder-Decoder Recurrent Neural Network
    Hirayama, Takahiro
    Miyazawa, Takaya
    Jibiki, Masahiro
    Kafle, Ved P.
    PROCEEDINGS OF THE 2019 IEEE CONFERENCE ON NETWORK SOFTWARIZATION (NETSOFT 2019), 2019, : 193 - 197
  • [45] Finger-Vein Image Inpainting Based on an Encoder-Decoder Generative Network
    Li, Dan
    Guo, Xiaojing
    Zhang, Haigang
    Jia, Guimin
    Yang, Jinfeng
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 87 - 97
  • [46] Encoder-Decoder Convolutional Neural Network based Iris-Sclera Segmentation
    Sahin, Gurkan
    Susuz, Orkun
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [47] Convolutional neural network based encoder-decoder architectures for semantic segmentation of plants
    Kolhar, Shrikrishna
    Jagtap, Jayant
    ECOLOGICAL INFORMATICS, 2021, 64
  • [48] End-to-End Deep Background Subtraction based on Encoder-Decoder Network
    Le, Duy H.
    Pham, Tuan, V
    PROCEEDINGS OF 2019 6TH NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT (NAFOSTED) CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2019, : 381 - 386
  • [49] Predicting Solar Performance Ratio Based on Encoder-Decoder Neural Network Model
    Yen, Chih-Feng
    Hsieh, He-Yen
    Su, Kuan-Wu
    Leu, Jenq-Shiou
    2019 11TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT), 2019,
  • [50] An road extraction method for remote sensing image based on Encoder-Decoder network
    He H.
    Wang S.
    Yang D.
    Wang S.
    Liu X.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2019, 48 (03): : 330 - 338