Hybrid explainable image caption generation using image processing and natural language processing

被引:0
|
作者
Mishra, Atul [1 ]
Agrawal, Anubhav [1 ]
Bhasker, Shailendra [2 ]
机构
[1] BML Munjal Univ, Gurgaon, India
[2] Harcourt Butler Tech Univ, Kanpur, India
关键词
NLP; Image caption generation; CNN; LSTM; InceptionV3; HYPERPARAMETER OPTIMIZATION;
D O I
10.1007/s13198-024-02495-5
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Image caption generation is among the most rapidly growing research areas that combine image processing methodologies with natural language processing (NLP) technique(s). The effectiveness of the combination of image processing and NLP techniques can revolutionaries the areas of content creation, media analysis, and accessibility. The study proposed a novel model to generate automatic image captions by consuming visual and linguistic features. Visual image features are extracted by applying Convolutional Neural Network and linguistic features by Long Short-Term Memory (LSTM) to generate text. Microsoft Common Objects in Context dataset with over 330,000 images having corresponding captions is used to train the proposed model. A comprehensive evaluation of various models, including VGGNet + LSTM, ResNet + LSTM, GoogleNet + LSTM, VGGNet + RNN, AlexNet + RNN, and AlexNet + LSTM, was conducted based on different batch sizes and learning rates. The assessment was performed using metrics such as BLEU-2 Score, METEOR Score, ROUGE-L Score, and CIDEr. The proposed method demonstrated competitive performance, suggesting its potential for further exploration and refinement. These findings underscore the importance of careful parameter tuning and model selection in image captioning tasks.
引用
收藏
页码:4874 / 4884
页数:11
相关论文
共 50 条
  • [21] Toolbox of image processing using the python']python language
    Silva, AG
    Lotufo, RD
    Machado, RC
    Saúde, AV
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 1049 - 1052
  • [22] Local Interpretations for Explainable Natural Language Processing: A Survey
    Luo, Siwen
    Ivison, Hamish
    Han, Soyeon Caren
    Poon, Josiah
    ACM COMPUTING SURVEYS, 2024, 56 (09)
  • [23] A Survey of the State of Explainable AI for Natural Language Processing
    Danilevsky, Marina
    Qian, Kun
    Aharonov, Ranit
    Katsis, Yannis
    Kawas, Ban
    Sen, Prithviraj
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 447 - 459
  • [24] Automatic image caption generation using deep learning
    Verma, Akash
    Yadav, Arun Kumar
    Kumar, Mohit
    Yadav, Divakar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
  • [25] Image processing using hybrid systems and it's applications
    Zhang, FL
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 2995 - 3000
  • [26] Image Caption Generation using Deep Learning Technique
    Amritkar, Chetan
    Jabade, Vaishali
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [27] Automatic image caption generation using deep learning
    Akash Verma
    Arun Kumar Yadav
    Mohit Kumar
    Divakar Yadav
    Multimedia Tools and Applications, 2024, 83 : 5309 - 5325
  • [28] Image caption generation using a dual attention mechanism
    Padate, Roshni
    Jain, Amit
    Kalla, Mukesh
    Sharma, Arvind
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [29] Computer Simulation of Mental Image Processing in Natural Language Understanding by Human
    Khummongkol, Rojanee
    Yokota, Masao
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE & TECHNOLOGY (ICAST), 2015, : 78 - 83
  • [30] Automatic derivation of programs for image processing from natural language descriptions
    Ren, F
    Zaima, Y
    PARALLEL AND DISTRIBUTED METHODS FOR IMAGE PROCESSING III, 1999, 3817 : 62 - 73