Domain-specific image captioning: a comprehensive review

被引:1
|
作者
Sharma, Himanshu [1 ]
Padha, Devanand [1 ]
机构
[1] Cent Univ Jammu, Dept Comp Sci & Informat Technol, Jammu 181124, Jammu & Kashmir, India
关键词
Computer vision; Deep learning; Medical image captioning; Natural image captioning; Remote sensing image captioning; AUTOMATIC IMAGE; GENERATION; MODELS; RETRIEVAL; SPEECH;
D O I
10.1007/s13735-024-00328-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An image caption is a sentence summarizing the semantic details of an image. It is a blended application of computer vision and natural language processing. The earlier research addressed this domain using machine learning approaches by modeling image captioning frameworks using hand-engineered feature extraction techniques. With the resurgence of deep-learning approaches, the development of improved and efficient image captioning frameworks is on the rise. Image captioning is witnessing tremendous growth in various domains as medical, remote sensing, security, visual assistance, and multimodal search engines. In this survey, we comprehensively study the image captioning frameworks based on our proposed domain-specific taxonomy. We explore the benchmark datasets and metrics leveraged for training and evaluating image captioning models in various application domains. In addition, we also perform a comparative analysis of the reviewed models. Natural image captioning, medical image captioning, and remote sensing image captioning are currently among the most prominent application domains of image captioning. The efficacy of real-time image captioning is a challenging obstacle limiting its implementation in sensitive areas such as visual aid, remote security, and healthcare. Further challenges include the scarcity of rich domain-specific datasets, training complexity, evaluation difficulty, and a deficiency of cross-domain knowledge transfer techniques. Despite the significant contributions made, there is a need for additional efforts to develop steadfast and influential image captioning models.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Domain-Specific Semantics Guided Approach to Video Captioning
    Hemalatha, M.
    Sekhar, C. Chandra
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1576 - 1585
  • [2] Visuals to Text: A Comprehensive Review on Automatic Image Captioning
    Yue Ming
    Nannan Hu
    Chunxiao Fan
    Fan Feng
    Jiangwan Zhou
    Hui Yu
    IEEE/CAAJournalofAutomaticaSinica, 2022, 9 (08) : 1339 - 1365
  • [3] Visuals to Text: A Comprehensive Review on Automatic Image Captioning
    Ming, Yue
    Hu, Nannan
    Fan, Chunxiao
    Feng, Fan
    Zhou, Jiangwan
    Yu, Hui
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (08) : 1339 - 1365
  • [4] Domain-Specific Optimisations for Image Processing on FPGAs
    Ali, Teymoor
    Bhowmik, Deepayan
    Nicol, Robert
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2023, 95 (10): : 1167 - 1179
  • [5] Domain-Specific Optimisations for Image Processing on FPGAs
    Teymoor Ali
    Deepayan Bhowmik
    Robert Nicol
    Journal of Signal Processing Systems, 2023, 95 : 1167 - 1179
  • [6] Domain-Specific Image Caption Generator with Semantic Ontology
    Han, Seung-Ho
    Choi, Ho-Jin
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 526 - 530
  • [7] HIPAcc : A Domain-Specific Language and Compiler for Image Processing
    Membarth, Richard
    Reiche, Oliver
    Hannig, Frank
    Teich, Juergen
    Koerner, Mario
    Eckert, Wieland
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (01) : 210 - 224
  • [8] Domain-specific model differencing for graphical domain-specific languages
    Jafarlou, Manouchehr Zadahmad
    ACM/IEEE 25TH INTERNATIONAL CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2022 COMPANION, 2022, : 205 - 208
  • [9] A comprehensive study of domain-specific emoji meanings in sentiment classification
    Nader Mahmoudi
    Łukasz P. Olech
    Paul Docherty
    Computational Management Science, 2022, 19 : 159 - 197
  • [10] A comprehensive study of domain-specific emoji meanings in sentiment classification
    Mahmoudi, Nader
    Olech, Lukasz P.
    Docherty, Paul
    COMPUTATIONAL MANAGEMENT SCIENCE, 2022, 19 (02) : 159 - 197