Deep Fuzzy Multiteacher Distillation Network for Medical Visual Question Answering

被引:1
|
作者
Liu, Yishu [1 ]
Chen, Bingzhi [2 ]
Wang, Shuihua [3 ]
Lu, Guangming [1 ]
Zhang, Zheng [4 ,5 ]
机构
[1] Harbin Inst Technol, Shenzhen Med Biometr Percept & Anal Engn Lab, Shenzhen 518055, Peoples R China
[2] South China Normal Univ, Sch Software, Foshan 528200, Peoples R China
[3] Xian Jiaotong Liverpool Univ, Dept Biol Sci, Suzhou 215123, Peoples R China
[4] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[5] Peng ChengLaboratory, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Biomedical imaging; Visualization; Fuzzy logic; Uncertainty; Task analysis; Question answering (information retrieval); Transformers; Fuzzy deep learning; fuzzy logic; knowledge distillation (KD); medical visual question answering (VQA);
D O I
10.1109/TFUZZ.2024.3402086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical visual question answering (medical VQA) is a critical cross-modal interaction task that garnered considerable attention in the medical domain. Several existing methods commonly leverage the vision-and-language pretraining paradigms to mitigate the limitation of small-scale data. Nevertheless, most of them still suffer from two challenges that remain for further research: 1) limited research focuses on distilling representation from a complete modality to guide the representation learning of masked data in other modalities. 2) Multimodal fusion based on self-attention mechanisms cannot effectively handle the inherent uncertainty and vagueness of information interaction across modalities. To mitigate these issues, in this article, we propose a novel deep fuzzy multiteacher distillation (DFMD) network for medical VQA, which can take advantage of fuzzy logic to model the uncertainties from vison-language representations across modalities in a multiteacher framework. Specifically, a multiteacher knowledge distillation module is conceived to assist in reconstructing the missing semantics under the supervision signal generated by teachers from the other complete modality, achieving more robust semantic interaction across modalities. Incorporating insights from the fuzzy logic theory, we propose a noise-robust encoder called FuzBERT that enables our DFMD model to reduce the imprecision and ambiguity in feature representation during the multimodal interaction process. To the best of our knowledge, our work is the first attempt to combine the fuzzy logic theory with the transformer-based encoder to effectively learn multimodal representation for medical VQA. Experimental results on the VQA-RAD and SLAKE datasets consistently demonstrate the superiority of our proposed DFMD method over state-of-the-art baselines.
引用
收藏
页码:5413 / 5427
页数:15
相关论文
共 50 条
  • [21] Medical knowledge-based network for Patient-oriented Visual Question Answering
    Jian, Huang
    Chen, Yihao
    Yong, Li
    Yang, Zhenguo
    Gong, Xuehao
    Lee, Wang Fu
    Xu, Xiaohong
    Liu, Wenyin
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [22] VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
    Bolanos, Marc
    Peris, Alvaro
    Casacuberta, Francisco
    Radeva, Petia
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 372 - 380
  • [23] Language-aware Visual Semantic Distillation for Video Question Answering
    Zou, Bo
    Yang, Chao
    Qiao, Yu
    Quan, Chengbin
    Zhao, Youjian
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27103 - 27113
  • [24] A Question-Centric Model for Visual Question Answering in Medical Imaging
    Vu, Minh H.
    Lofstedt, Tommy
    Nyholm, Tufve
    Sznitman, Raphael
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2856 - 2868
  • [25] Medical Visual Question Answering via Conditional Reasoning
    Zhan, Li-Ming
    Liu, Bo
    Fan, Lu
    Chen, Jiaxin
    Wu, Xiao-Ming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2345 - 2354
  • [26] MMQL: Multi-Question Learning for Medical Visual Question Answering
    Chen, Qishen
    Bian, Minjie
    Xu, Huahu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 480 - 489
  • [27] Automated Medical Report Generation and Visual Question Answering
    Zhou, Luping
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA COMPUTING FOR HEALTH AND MEDICINE, MCHM 2024, 2024, : 3 - 4
  • [28] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
    Zhang, Anda
    Tao, Wei
    Li, Ziyan
    Wang, Haofen
    Zhang, Wenqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842
  • [29] Overcoming Data Limitation in Medical Visual Question Answering
    Nguyen, Binh D.
    Thanh-Toan Do
    Nguyen, Binh X.
    Do, Tuong
    Tjiputra, Erman
    Tran, Quang D.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 522 - 530
  • [30] Generative Models in Medical Visual Question Answering: A Survey
    Dong, Wenjie
    Shen, Shuhao
    Han, Yuqiang
    Tan, Tao
    Wu, Jian
    Xu, Hongxia
    APPLIED SCIENCES-BASEL, 2025, 15 (06):