A comprehensive literature review on image captioning methods and metrics based on deep learning technique

被引：3

作者：

Al-Shamayleh, Ahmad Sami ^{[1
]}

Adwan, Omar ^{[2
]}

Alsharaiah, Mohammad A. ^{[1
]}

Hussein, Abdelrahman H. ^{[3
]}

Kharma, Qasem M. ^{[4
]}

Eke, Christopher Ifeanyi ^{[5
]}

机构：

[1] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Data Sci & Artificial Intelligence, Amman 19328, Jordan

[2] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Comp Sci, Amman 19328, Jordan

[3] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Networks & Informat Secur, Amman 19328, Jordan

[4] Al Ahliyya Amman Univ, Fac Informat Technol, Dept Software Engn, Amman 19328, Jordan

[5] Fed Univ Lafia, Fac Comp, Dept Comp Sci, PMB 146, Lafia, Nasarawa State, Nigeria

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 12期

关键词：

Image caption; Natural Language processing; Deep learning; Computer vision; ATTENTION; NETWORK; MODEL;

D O I：

10.1007/s11042-024-18307-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

One of the trending areas of study in artificial intelligence is image captioning. Image captioning is a process of creating descriptive information for visual objects, image metadata, or entities present in an image. It extracts features from the image using the integration of computer vision and Natural Language Processing (NLP), uses this data to identify objects, actions, and the relationships among them, and creates image descriptions. It is not only an extremely important but also a very difficult task in computer vision research. A lot of work on image captioning methods that utilize a deep learning approach has been conducted. The goal of this article is to discover, evaluate, and summarize the works that examine deep learning applications in the context of image captioning systems. We found 548 papers using a systematic literature review (SLR) technique, of which 38 were identified as primary studies and so underwent in-depth analysis. This review's result demonstrates that LSTM, CNN, and RNN are mostly employ deep learning techniques for image captioning. Also, the most popular used datasets based on the selected primary studies are MS COCO Dataset, Flickr8k, and Flickr30k. These are standardized benchmark datasets being employed by researchers to compare their methods on common test-beds. The review also showed that the evaluation methods such as BLEU, CIDEr, SPICE, METEOR, and ROUGE-L are the most often employed ones according to the findings from this SMR study. Despite the considerable advancements achieved by deep learning approaches in this study domain, there is always a potential for improvement. Finally, the review provided future research for image captioning systems. We believe that this SLR will act as a reference for other scientists and an inspiration to gather the most recent data for their study evaluation.

引用

页码：34219 / 34268

页数：50

共 50 条

[31] A Systematic Literature Review on Image Captioning
Staniute, Raimonda
Sesok, Dmitrij
APPLIED SCIENCES-BASEL, 2019, 9 (10):
[32] A comprehensive review of image denoising in deep learning
Jebur, Rusul Sabah
Zabil, Mohd Hazli Bin Mohamed
Hammood, Dalal Adulmohsin
Cheng, Lim Kok
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (20) : 58181 - 58199
[33] Deep Learning Based Code Generation Methods: Literature Review
Yang Z.-Z.
Chen S.-R.
Gao C.-Y.
Li Z.-H.
Li G.
Lyu M.R.-T.
Ruan Jian Xue Bao/Journal of Software, 2024, 35 (02): : 604 - 628
[34] Explainable Methods for Image-Based Deep Learning: A Review
Gupta, Lav Kumar
Koundal, Deepika
Mongia, Shweta
ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2023, 30 (04) : 2651 - 2666
[35] Explainable Methods for Image-Based Deep Learning: A Review
Lav Kumar Gupta
Deepika Koundal
Shweta Mongia
Archives of Computational Methods in Engineering, 2023, 30 : 2651 - 2666
[36] Deep learning-based welding image recognition: A comprehensive review
Liu, Tianyuan
Zheng, Pai
Bao, Jinsong
JOURNAL OF MANUFACTURING SYSTEMS, 2023, 68 : 601 - 625
[37] Deep Learning for Video Captioning: A Review
Chen, Shaoxiang
Yao, Ting
Jiang, Yu-Gang
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6283 - 6290
[38] A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future Directions
Yang, Fan
Ismail, Nor Azman
Pang, Yee Yong
Kebande, Victor R.
Al-Dhaqm, Arafat
Koh, Tieng Wei
IEEE ACCESS, 2024, 12 : 14847 - 14869
[39] A comprehensive review of deep learning-based variant calling methods
Ren, Junjun
Zhang, Zhengqian
Wu, Ying
Wang, Jialiang
Liu, Yongzhuang
BRIEFINGS IN FUNCTIONAL GENOMICS, 2024, 23 (04) : 303 - 313
[40] Facilitated Deep Learning Models for Image Captioning
Azhar, Imtinan
Afyouni, Imad
Elnagar, Ashraf
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,

← 1 2 3 4 5 →