Intelligent visual question answering in TCM education: An innovative application of IoT and multimodal fusion

被引:0
|
作者
Bi, Wei [1 ]
Xiong, Qingzhen [1 ]
Chen, Xingyi [1 ]
Du, Qingkun [2 ]
Wu, Jun [3 ]
Zhuang, Zhaoyu [4 ]
机构
[1] Guangdong Univ Finance & Econ, Coll Art & Design, Guangzhou 510320, Peoples R China
[2] Guangdong Jiangmen Presch Teachers Coll, Coll Arts & Educ, Jiangmen 529000, Peoples R China
[3] Shenzhen Univ, Sch Art & Design, Div Arts, Shenzhen 518061, Peoples R China
[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
关键词
Internet of Things (IoT); TCM education; Visual question answering (VQA); VisualBERT; Multimodal fusion; Deep learning;
D O I
10.1016/j.aej.2024.12.052
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This paper proposes an innovative Traditional Chinese Medicine Ancient Text Education Intelligent Visual Question Answering System (TCM-VQA IoTNet), which integrates Internet of Things (IoT) technology with multimodal learning to achieve a deep understanding and intelligent question answering of both the images and textual content of traditional Chinese medicine ancient texts. The system utilizes the VisualBERT model for multimodal feature extraction, combined with Gated Recurrent Units (GRU) to process time-series data from IoT sensors, and employs an attention mechanism to optimize feature fusion, dynamically adjusting the question answering strategy. Experimental evaluations on standard datasets such as VQA v2.0, CMRC 2018, and the Chinese Traditional Medicine Dataset demonstrate that TCM-VQA IoTNet achieves accuracy rates of 72.7%, 69.%, and 75.4% respectively, with F1-scores of 70.3%, 67.5%, and 73.9%, significantly outperforming existing mainstream models. Furthermore, TCM-VQA IoTNet has shown excellent performance in practical applications of traditional Chinese medicine education, significantly enhancing the precision and interactivity of intelligent education. Future research will focus on improving the model's generalization ability and computational efficiency, further expanding its application potential in traditional Chinese medicine diagnosis and education.
引用
收藏
页码:325 / 336
页数:12
相关论文
共 50 条
  • [1] MUTAN: Multimodal Tucker Fusion for Visual Question Answering
    Ben-younes, Hedi
    Cadene, Remi
    Cord, Matthieu
    Thome, Nicolas
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2631 - 2639
  • [2] Multimodal fusion: advancing medical visual question-answering
    Mudgal, Anjali
    Kush, Udbhav
    Kumar, Aditya
    Jafari, Amir
    Neural Computing and Applications, 2024, 36 (33) : 20949 - 20962
  • [3] Improving Visual Question Answering by Multimodal Gate Fusion Network
    Xiang, Shenxiang
    Chen, Qiaohong
    Fang, Xian
    Guo, Menghao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [4] EduVQA: A multimodal Visual Question Answering framework for smart education
    Xiao, Jiongen
    Zhang, Zifeng
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 122 : 615 - 624
  • [5] Multimodal feature fusion by relational reasoning and attention for visual question answering
    Zhang, Weifeng
    Yu, Jing
    Hu, Hua
    Hu, Haiyang
    Qin, Zengchang
    INFORMATION FUSION, 2020, 55 (55) : 116 - 126
  • [6] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [7] Visual Question Answering for Intelligent Interaction
    Gao, Panpan
    Sun, Hanxu
    Chen, Gang
    Wang, Ruiquan
    Li, Minggang
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [8] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
    Zhang, Weifeng
    Yu, Jing
    Zhao, Wenhong
    Ran, Chuan
    INFORMATION FUSION, 2021, 72 : 70 - 79
  • [9] Medical Visual Question Answering Model Based on Knowledge Enhancement and Multimodal Fusion
    Dianyuan, Zhang
    Chuanming, Yu
    Data Analysis and Knowledge Discovery, 2024, 8 (8-9) : 226 - 239
  • [10] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
    Zhang, Weifeng
    Yu, Jing
    Zhao, Wenhong
    Ran, Chuan
    Information Fusion, 2021, 72 : 70 - 79