Intelligent visual question answering in TCM education: An innovative application of IoT and multimodal fusion

被引:0
|
作者
Bi, Wei [1 ]
Xiong, Qingzhen [1 ]
Chen, Xingyi [1 ]
Du, Qingkun [2 ]
Wu, Jun [3 ]
Zhuang, Zhaoyu [4 ]
机构
[1] Guangdong Univ Finance & Econ, Coll Art & Design, Guangzhou 510320, Peoples R China
[2] Guangdong Jiangmen Presch Teachers Coll, Coll Arts & Educ, Jiangmen 529000, Peoples R China
[3] Shenzhen Univ, Sch Art & Design, Div Arts, Shenzhen 518061, Peoples R China
[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
关键词
Internet of Things (IoT); TCM education; Visual question answering (VQA); VisualBERT; Multimodal fusion; Deep learning;
D O I
10.1016/j.aej.2024.12.052
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This paper proposes an innovative Traditional Chinese Medicine Ancient Text Education Intelligent Visual Question Answering System (TCM-VQA IoTNet), which integrates Internet of Things (IoT) technology with multimodal learning to achieve a deep understanding and intelligent question answering of both the images and textual content of traditional Chinese medicine ancient texts. The system utilizes the VisualBERT model for multimodal feature extraction, combined with Gated Recurrent Units (GRU) to process time-series data from IoT sensors, and employs an attention mechanism to optimize feature fusion, dynamically adjusting the question answering strategy. Experimental evaluations on standard datasets such as VQA v2.0, CMRC 2018, and the Chinese Traditional Medicine Dataset demonstrate that TCM-VQA IoTNet achieves accuracy rates of 72.7%, 69.%, and 75.4% respectively, with F1-scores of 70.3%, 67.5%, and 73.9%, significantly outperforming existing mainstream models. Furthermore, TCM-VQA IoTNet has shown excellent performance in practical applications of traditional Chinese medicine education, significantly enhancing the precision and interactivity of intelligent education. Future research will focus on improving the model's generalization ability and computational efficiency, further expanding its application potential in traditional Chinese medicine diagnosis and education.
引用
收藏
页码:325 / 336
页数:12
相关论文
共 50 条
  • [31] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [32] Dual-Key Multimodal Backdoors for Visual Question Answering
    Walmer, Matthew
    Sikka, Karan
    Sur, Indranil
    Shrivastava, Abhinav
    Jha, Susmit
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15354 - 15364
  • [33] FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts
    Singh, Shubhankar
    Chaurasia, Purvi
    Varun, Yerram
    Pandya, Pranshu
    Gupta, Vatsal
    Gupta, Vivek
    Roth, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1330 - 1350
  • [34] Multimodal attention-driven visual question answering for Malayalam
    Kovath A.G.
    Nayyar A.
    Sikha O.K.
    Neural Computing and Applications, 2024, 36 (24) : 14691 - 14708
  • [35] Contrastive training of a multimodal encoder for medical visual question answering
    Silva, Joao Daniel
    Martins, Bruno
    Magalhaes, Joao
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 18
  • [36] Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
    Saqur, Raeid
    Narasimhan, Karthik
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [37] Fusion of Detected Objects in Text for Visual Question Answering
    Alberti, Chris
    Ling, Jeffrey
    Collins, Michael
    Reitter, David
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2131 - 2140
  • [38] Visual Question Answering based on multimodal triplet knowledge accumuation
    Wang, Fengjuan
    An, Gaoyun
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 81 - 84
  • [39] Dual-Key Multimodal Backdoors for Visual Question Answering
    Walmer, Matthew
    Sikka, Karan
    Sur, Indranil
    Shrivastava, Abhinav
    Jha, Susmit
    arXiv, 2021,
  • [40] CONTEXT RELATION FUSION MODEL FOR VISUAL QUESTION ANSWERING
    Zhang, Haotian
    Wu, Wei
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2112 - 2116