A comprehensive literature review on image captioning methods and metrics based on deep learning technique

被引:3
|
作者
Al-Shamayleh, Ahmad Sami [1 ]
Adwan, Omar [2 ]
Alsharaiah, Mohammad A. [1 ]
Hussein, Abdelrahman H. [3 ]
Kharma, Qasem M. [4 ]
Eke, Christopher Ifeanyi [5 ]
机构
[1] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Data Sci & Artificial Intelligence, Amman 19328, Jordan
[2] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Comp Sci, Amman 19328, Jordan
[3] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Networks & Informat Secur, Amman 19328, Jordan
[4] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Software Engn, Amman 19328, Jordan
[5] Fed Univ Lafia, Fac Comp, Dept Comp Sci, PMB 146, Lafia, Nasarawa State, Nigeria
关键词
Image caption; Natural Language processing; Deep learning; Computer vision; ATTENTION; NETWORK; MODEL;
D O I
10.1007/s11042-024-18307-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the trending areas of study in artificial intelligence is image captioning. Image captioning is a process of creating descriptive information for visual objects, image metadata, or entities present in an image. It extracts features from the image using the integration of computer vision and Natural Language Processing (NLP), uses this data to identify objects, actions, and the relationships among them, and creates image descriptions. It is not only an extremely important but also a very difficult task in computer vision research. A lot of work on image captioning methods that utilize a deep learning approach has been conducted. The goal of this article is to discover, evaluate, and summarize the works that examine deep learning applications in the context of image captioning systems. We found 548 papers using a systematic literature review (SLR) technique, of which 38 were identified as primary studies and so underwent in-depth analysis. This review's result demonstrates that LSTM, CNN, and RNN are mostly employ deep learning techniques for image captioning. Also, the most popular used datasets based on the selected primary studies are MS COCO Dataset, Flickr8k, and Flickr30k. These are standardized benchmark datasets being employed by researchers to compare their methods on common test-beds. The review also showed that the evaluation methods such as BLEU, CIDEr, SPICE, METEOR, and ROUGE-L are the most often employed ones according to the findings from this SMR study. Despite the considerable advancements achieved by deep learning approaches in this study domain, there is always a potential for improvement. Finally, the review provided future research for image captioning systems. We believe that this SLR will act as a reference for other scientists and an inspiration to gather the most recent data for their study evaluation.
引用
收藏
页码:34219 / 34268
页数:50
相关论文
共 50 条
  • [21] A comprehensive review on deep learning based remote sensing image super-resolution methods
    Wang, Peijuan
    Bayram, Bulent
    Sertel, Elif
    EARTH-SCIENCE REVIEWS, 2022, 232
  • [22] Enhanced Image Captioning with Color Recognition Using Deep Learning Methods
    Chang, Yeong-Hwa
    Chen, Yen-Jen
    Huang, Ren-Hung
    Yu, Yi-Ting
    APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [23] Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning
    Oluwasammi, Ariyo
    Aftab, Muhammad Umar
    Qin, Zhiguang
    Son Tung Ngo
    Thang Van Doan
    Son Ba Nguyen
    Son Hoang Nguyen
    Giang Hoang Nguyen
    COMPLEXITY, 2021, 2021
  • [24] Deep Learning for Military Image Captioning
    Das, Subrata
    Jain, Lalit
    Das, Amp
    2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 2165 - 2171
  • [25] Inverse Halftoning Methods Based on Deep Learning and Their Evaluation Metrics: A Review
    Li, Mei
    Zhang, Erhu
    Wang, Yutong
    Duan, Jinghong
    Jing, Cuining
    APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [26] Image Captioning using Deep Learning
    Jain, Yukti Sanjay
    Dhopeshwar, Tanisha
    Chadha, Supreet Kaur
    Pagire, Vrushali
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021,
  • [27] Image Captioning Using Deep Learning
    Adithya, Paluvayi Veera
    Kalidindi, Mourya Viswanadh
    Swaroop, Nallani Jyothi
    Vishwas, H. N.
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 42 - 58
  • [28] A Comprehensive Review of Methods for Hydrological Forecasting Based on Deep Learning
    Zhao, Xinfeng
    Wang, Hongyan
    Bai, Mingyu
    Xu, Yingjie
    Dong, Shengwen
    Rao, Hui
    Ming, Wuyi
    WATER, 2024, 16 (10)
  • [29] A Comprehensive Review of Group Recommendation Methods Based on Deep Learning
    Zheng, Nan
    Zhang, Song
    Liu, Yu-Qiao
    Wang, Yu-Tong
    Wang, Fei-Yue
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (12): : 2301 - 2324
  • [30] A deep learning-based image captioning method to automatically generate comprehensive explanations of bridge damage
    Chun, Pang-Jo
    Yamane, Tatsuro
    Maemura, Yu
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2022, 37 (11) : 1387 - 1401