Deep Fuzzy Multiteacher Distillation Network for Medical Visual Question Answering

被引:1
|
作者
Liu, Yishu [1 ]
Chen, Bingzhi [2 ]
Wang, Shuihua [3 ]
Lu, Guangming [1 ]
Zhang, Zheng [4 ,5 ]
机构
[1] Harbin Inst Technol, Shenzhen Med Biometr Percept & Anal Engn Lab, Shenzhen 518055, Peoples R China
[2] South China Normal Univ, Sch Software, Foshan 528200, Peoples R China
[3] Xian Jiaotong Liverpool Univ, Dept Biol Sci, Suzhou 215123, Peoples R China
[4] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[5] Peng ChengLaboratory, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Biomedical imaging; Visualization; Fuzzy logic; Uncertainty; Task analysis; Question answering (information retrieval); Transformers; Fuzzy deep learning; fuzzy logic; knowledge distillation (KD); medical visual question answering (VQA);
D O I
10.1109/TFUZZ.2024.3402086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical visual question answering (medical VQA) is a critical cross-modal interaction task that garnered considerable attention in the medical domain. Several existing methods commonly leverage the vision-and-language pretraining paradigms to mitigate the limitation of small-scale data. Nevertheless, most of them still suffer from two challenges that remain for further research: 1) limited research focuses on distilling representation from a complete modality to guide the representation learning of masked data in other modalities. 2) Multimodal fusion based on self-attention mechanisms cannot effectively handle the inherent uncertainty and vagueness of information interaction across modalities. To mitigate these issues, in this article, we propose a novel deep fuzzy multiteacher distillation (DFMD) network for medical VQA, which can take advantage of fuzzy logic to model the uncertainties from vison-language representations across modalities in a multiteacher framework. Specifically, a multiteacher knowledge distillation module is conceived to assist in reconstructing the missing semantics under the supervision signal generated by teachers from the other complete modality, achieving more robust semantic interaction across modalities. Incorporating insights from the fuzzy logic theory, we propose a noise-robust encoder called FuzBERT that enables our DFMD model to reduce the imprecision and ambiguity in feature representation during the multimodal interaction process. To the best of our knowledge, our work is the first attempt to combine the fuzzy logic theory with the transformer-based encoder to effectively learn multimodal representation for medical VQA. Experimental results on the VQA-RAD and SLAKE datasets consistently demonstrate the superiority of our proposed DFMD method over state-of-the-art baselines.
引用
收藏
页码:5413 / 5427
页数:15
相关论文
共 50 条
  • [1] Trilinear Distillation Learning and Question Feature Capturing for Medical Visual Question Answering
    Long, Shaopei
    Li, Yong
    Weng, Heng
    Tang, Buzhou
    Wang, Fu Lee
    Hao, Tianyong
    NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III, 2025, 2183 : 162 - 177
  • [2] Hierarchical deep multi-modal network for medical visual question answering
    Gupta D.
    Suman S.
    Ekbal A.
    Expert Systems with Applications, 2021, 164
  • [3] Answer Distillation Network With Bi-Text-Image Attention for Medical Visual Question Answering
    Gong, Hongfang
    Li, Li
    IEEE ACCESS, 2025, 13 : 16455 - 16465
  • [4] Answer Distillation for Visual Question Answering
    Fang, Zhiwei
    Liu, Jing
    Tang, Qu
    Li, Yong
    Lu, Hanqing
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 72 - 87
  • [5] Optimal Deep Neural Network-Based Model for Answering Visual Medical Question
    Gasmi, Karim
    Ben Ltaifa, Ibtihel
    Lejeune, Gael
    Alshammari, Hamoud
    Ben Ammar, Lassaad
    Mahmood, Mahmood A.
    CYBERNETICS AND SYSTEMS, 2022, 53 (05) : 403 - 424
  • [6] Deep Attention Neural Tensor Network for Visual Question Answering
    Bai, Yalong
    Fu, Jianlong
    Zhao, Tiejun
    Mei, Tao
    COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 21 - 37
  • [7] Deep Modular Bilinear Attention Network for Visual Question Answering
    Yan, Feng
    Silamu, Wushouer
    Li, Yanbing
    SENSORS, 2022, 22 (03)
  • [8] Question-guided feature pyramid network for medical visual question answering
    Yu, Yonglin
    Li, Haifeng
    Shi, Hanrong
    Li, Lin
    Xiao, Jun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [9] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
  • [10] Learning to Specialize with Knowledge Distillation for Visual Question Answering
    Mun, Jonghwan
    Lee, Kimin
    Shin, Jinwoo
    Han, Bohyung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31