Contextualized Keyword Representations for Multi-modal Retinal Image Captioning

被引:11
|
作者
Huang, Jia-Hong [1 ]
Wu, Ting-Wei [2 ]
Worring, Marcel [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
OPTIC-NERVE; CLASSIFICATION;
D O I
10.1145/3460426.3463667
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical image captioning automatically generates a medical description to describe the content of a given medical image. Traditional medical image captioning models create a medical description based on a single medical image input only. Hence, an abstract medical description or concept is hard to be generated based on the traditional approach. Such a method limits the effectiveness of medical image captioning. Multi-modal medical image captioning is one of the approaches utilized to address this problem. In multi-modal medical image captioning, textual input, e.g., expert-defined keywords, is considered as one of the main drivers of medical description generation. Thus, encoding the textual input and the medical image effectively are both important for the task of multi-modal medical image captioning. In this work, a new end-to-end deep multi-modal medical image captioning model is proposed. Contextualized keyword representations, textual feature reinforcement, and masked self-attention are used to develop the proposed approach. Based on the evaluation of an existing multi-modal medical image captioning dataset, experimental results show that the proposed model is effective with an increase of +53.2% in BLEU-avg and +18.6% in CIDEr, compared with the state-of-the-art method.
引用
收藏
页码:645 / 652
页数:8
相关论文
共 50 条
  • [1] Multi-Modal Image Captioning for the Visually Impaired
    Ahsan, Hiba
    Bhalla, Nikita
    Bhatt, Daivat
    Shah, Kaivankumar
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 53 - 60
  • [2] Multi-Modal Graph Aggregation Transformer for image captioning
    Chen, Lizhi
    Li, Kesen
    NEURAL NETWORKS, 2025, 181
  • [3] Multi-modal Dense Video Captioning
    Iashin, Vladimir
    Rahtu, Esa
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4117 - 4126
  • [4] Multi-modal Dependency Tree for Video Captioning
    Zhao, Wentian
    Wu, Xinxiao
    Luo, Jiebo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Boosting Entity-Aware Image Captioning With Multi-Modal Knowledge Graph
    Zhao, Wentian
    Wu, Xinxiao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2659 - 2670
  • [6] Fine-tuning with Multi-modal Entity Prompts for News Image Captioning
    Zhang, Jingjing
    Fang, Shancheng
    Mao, Zhendong
    Zhang, Zhiwei
    Zhang, Yongdong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4365 - 4373
  • [7] Image Captioning with Clause-Focused Metrics in a Multi-Modal Setting for Marketing
    Harzig, Philipp
    Zecha, Dan
    Lienhart, Rainer
    Kaiser, Carolin
    Schallner, Rene
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 419 - 424
  • [8] Learning Cross-modal Representations with Multi-relations for Image Captioning
    Cheng, Peng
    Le, Tung
    Racharak, Teeradaj
    Cao Yiming
    Kong Weikun
    Minh Le Nguyen
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 346 - 353
  • [9] LEARNING OPTIMAL SHAPE REPRESENTATIONS FOR MULTI-MODAL IMAGE REGISTRATION
    Grossiord, Eloise
    Risser, Laurent
    Kanoun, Salim
    Ken, Soleakhena
    Malgouyres, Francois
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 722 - 725
  • [10] GLOVE-ING ATTENTION: A MULTI-MODAL NEURAL LEARNING APPROACH TO IMAGE CAPTIONING
    Anundskas, Lars Halvor
    Afridi, Hina
    Tarekegn, Adane Nega
    Yamin, Muhammad Mudassar
    Ullah, Mohib
    Yamin, Saira
    Cheikh, Faouzi Alaya
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,