A comprehensive literature review on image captioning methods and metrics based on deep learning technique

被引:3
|
作者
Al-Shamayleh, Ahmad Sami [1 ]
Adwan, Omar [2 ]
Alsharaiah, Mohammad A. [1 ]
Hussein, Abdelrahman H. [3 ]
Kharma, Qasem M. [4 ]
Eke, Christopher Ifeanyi [5 ]
机构
[1] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Data Sci & Artificial Intelligence, Amman 19328, Jordan
[2] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Comp Sci, Amman 19328, Jordan
[3] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Networks & Informat Secur, Amman 19328, Jordan
[4] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Software Engn, Amman 19328, Jordan
[5] Fed Univ Lafia, Fac Comp, Dept Comp Sci, PMB 146, Lafia, Nasarawa State, Nigeria
关键词
Image caption; Natural Language processing; Deep learning; Computer vision; ATTENTION; NETWORK; MODEL;
D O I
10.1007/s11042-024-18307-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the trending areas of study in artificial intelligence is image captioning. Image captioning is a process of creating descriptive information for visual objects, image metadata, or entities present in an image. It extracts features from the image using the integration of computer vision and Natural Language Processing (NLP), uses this data to identify objects, actions, and the relationships among them, and creates image descriptions. It is not only an extremely important but also a very difficult task in computer vision research. A lot of work on image captioning methods that utilize a deep learning approach has been conducted. The goal of this article is to discover, evaluate, and summarize the works that examine deep learning applications in the context of image captioning systems. We found 548 papers using a systematic literature review (SLR) technique, of which 38 were identified as primary studies and so underwent in-depth analysis. This review's result demonstrates that LSTM, CNN, and RNN are mostly employ deep learning techniques for image captioning. Also, the most popular used datasets based on the selected primary studies are MS COCO Dataset, Flickr8k, and Flickr30k. These are standardized benchmark datasets being employed by researchers to compare their methods on common test-beds. The review also showed that the evaluation methods such as BLEU, CIDEr, SPICE, METEOR, and ROUGE-L are the most often employed ones according to the findings from this SMR study. Despite the considerable advancements achieved by deep learning approaches in this study domain, there is always a potential for improvement. Finally, the review provided future research for image captioning systems. We believe that this SLR will act as a reference for other scientists and an inspiration to gather the most recent data for their study evaluation.
引用
收藏
页码:34219 / 34268
页数:50
相关论文
共 50 条
  • [1] A comprehensive literature review on image captioning methods and metrics based on deep learning technique
    Ahmad Sami Al-Shamayleh
    Omar Adwan
    Mohammad A. Alsharaiah
    Abdelrahman H. Hussein
    Qasem M. Kharma
    Christopher Ifeanyi Eke
    Multimedia Tools and Applications, 2024, 83 : 34219 - 34268
  • [2] Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
    Wajid, Mohammad Saif
    Terashima-Marin, Hugo
    Najafirad, Peyman
    Wajid, Mohd Anas
    ENGINEERING REPORTS, 2024, 6 (01)
  • [3] Image Captioning using Deep Learning: A Systematic Literature Review
    Chohan, Murk
    Khan, Adil
    Mahar, Muhammad Saleem
    Hassan, Saif
    Ghafoor, Abdul
    Khan, Mehmood
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 278 - 286
  • [4] A Comprehensive Survey of Deep Learning for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    ACM COMPUTING SURVEYS, 2019, 51 (06)
  • [5] Image Captioning Methods and Metrics
    Sargar, Omkar
    Kinger, Shakti
    2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 522 - 526
  • [6] A Comprehensive Review of Deep-Learning-Based Methods for Image Forensics
    Camacho, Ivan Castillo
    Wang, Kai
    JOURNAL OF IMAGING, 2021, 7 (04)
  • [7] A Novel Technique for Image Captioning Based on Hierarchical Clustering and Deep Learning
    Rizwan Ur Rahman
    Pavan Kumar
    Aditya Mohan
    Rabia Musheer Aziz
    Deepak Singh Tomar
    SN Computer Science, 6 (4)
  • [8] Deep Learning Approaches on Image Captioning: A Review
    Ghandi, Taraneh
    Pourreza, Hamidreza
    Mahyar, Hamidreza
    ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [9] A detailed review of prevailing image captioning methods using deep learning techniques
    Deorukhkar, Kalpana
    Ket, Satish
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (01) : 1313 - 1336
  • [10] A detailed review of prevailing image captioning methods using deep learning techniques
    Kalpana Deorukhkar
    Satish Ket
    Multimedia Tools and Applications, 2022, 81 : 1313 - 1336