Contextualized Keyword Representations for Multi-modal Retinal Image Captioning

被引:11
|
作者
Huang, Jia-Hong [1 ]
Wu, Ting-Wei [2 ]
Worring, Marcel [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
OPTIC-NERVE; CLASSIFICATION;
D O I
10.1145/3460426.3463667
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical image captioning automatically generates a medical description to describe the content of a given medical image. Traditional medical image captioning models create a medical description based on a single medical image input only. Hence, an abstract medical description or concept is hard to be generated based on the traditional approach. Such a method limits the effectiveness of medical image captioning. Multi-modal medical image captioning is one of the approaches utilized to address this problem. In multi-modal medical image captioning, textual input, e.g., expert-defined keywords, is considered as one of the main drivers of medical description generation. Thus, encoding the textual input and the medical image effectively are both important for the task of multi-modal medical image captioning. In this work, a new end-to-end deep multi-modal medical image captioning model is proposed. Contextualized keyword representations, textual feature reinforcement, and masked self-attention are used to develop the proposed approach. Based on the evaluation of an existing multi-modal medical image captioning dataset, experimental results show that the proposed model is effective with an increase of +53.2% in BLEU-avg and +18.6% in CIDEr, compared with the state-of-the-art method.
引用
收藏
页码:645 / 652
页数:8
相关论文
共 50 条
  • [31] Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
    Rahman, Tanzila
    Xu, Bicheng
    Sigal, Leonid
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8907 - 8916
  • [32] MULTI-MODAL IMAGE STITCHING WITH NONLINEAR OPTIMIZATION
    Saha, Arindam
    Maity, Soumyadip
    Bhowmick, Brojeshwar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1987 - 1991
  • [33] Multi-Modal Deformable Medical Image Registration
    Fookes, Clinton
    Sridharan, Sridha
    ICSPCS: 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, PROCEEDINGS, 2008, : 661 - 669
  • [34] A variational approach to multi-modal image matching
    Chefd'Hotel, C
    Hermosillo, G
    Faugeras, O
    IEEE WORKSHOP ON VARIATIONAL AND LEVEL SET METHODS IN COMPUTER VISION, PROCEEDINGS, 2001, : 21 - 28
  • [35] Event-centric multi-modal fusion method for dense video captioning
    Chang, Zhi
    Zhao, Dexin
    Chen, Huilin
    Li, Jingdan
    Liu, Pengfei
    NEURAL NETWORKS, 2022, 146 : 120 - 129
  • [36] Multi-modal retinal imaging with active retinal tracking and wavefront sensing
    Vienola, Kari V.
    Jonnal, Ravi Sankar
    Migacz, Justin V.
    Gorczynska, Iwona
    Zawadzki, Robert
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2020, 61 (07)
  • [37] Multi-modal Image Fusion with KNN Matting
    Zhang, Xia
    Lin, Hui
    Kang, Xudong
    Li, Shutao
    PATTERN RECOGNITION (CCPR 2014), PT II, 2014, 484 : 89 - 96
  • [38] MixBERT for Multi-modal Matching in Image Advertising
    Yu, Tan
    Li, Xiaokang
    Xie, Jianwen
    Yin, Ruiyang
    Xu, Qing
    Li, Ping
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3597 - 3602
  • [39] Multi-modal retinal imaging for investigating neurovascular health
    Gibbon, Samuel
    Hamid, Charlene
    Threlfall, Adam
    Ritchie, Craig
    Dhillon, Baljean
    Giarratano, Ylenia
    Rashid, Darwon
    Trucco, Emanuele
    Macgillivray, Thomas J.
    EYE, 2024, 38 (SUPPL 2) : 72 - 73
  • [40] A Multi-modal SPM Model for Image Classification
    Zheng, Peng
    Zhao, Zhong-Qiu
    Gao, Jun
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 525 - 535