Intelligent visual question answering in TCM education: An innovative application of IoT and multimodal fusion

被引:0
|
作者
Bi, Wei [1 ]
Xiong, Qingzhen [1 ]
Chen, Xingyi [1 ]
Du, Qingkun [2 ]
Wu, Jun [3 ]
Zhuang, Zhaoyu [4 ]
机构
[1] Guangdong Univ Finance & Econ, Coll Art & Design, Guangzhou 510320, Peoples R China
[2] Guangdong Jiangmen Presch Teachers Coll, Coll Arts & Educ, Jiangmen 529000, Peoples R China
[3] Shenzhen Univ, Sch Art & Design, Div Arts, Shenzhen 518061, Peoples R China
[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
关键词
Internet of Things (IoT); TCM education; Visual question answering (VQA); VisualBERT; Multimodal fusion; Deep learning;
D O I
10.1016/j.aej.2024.12.052
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This paper proposes an innovative Traditional Chinese Medicine Ancient Text Education Intelligent Visual Question Answering System (TCM-VQA IoTNet), which integrates Internet of Things (IoT) technology with multimodal learning to achieve a deep understanding and intelligent question answering of both the images and textual content of traditional Chinese medicine ancient texts. The system utilizes the VisualBERT model for multimodal feature extraction, combined with Gated Recurrent Units (GRU) to process time-series data from IoT sensors, and employs an attention mechanism to optimize feature fusion, dynamically adjusting the question answering strategy. Experimental evaluations on standard datasets such as VQA v2.0, CMRC 2018, and the Chinese Traditional Medicine Dataset demonstrate that TCM-VQA IoTNet achieves accuracy rates of 72.7%, 69.%, and 75.4% respectively, with F1-scores of 70.3%, 67.5%, and 73.9%, significantly outperforming existing mainstream models. Furthermore, TCM-VQA IoTNet has shown excellent performance in practical applications of traditional Chinese medicine education, significantly enhancing the precision and interactivity of intelligent education. Future research will focus on improving the model's generalization ability and computational efficiency, further expanding its application potential in traditional Chinese medicine diagnosis and education.
引用
收藏
页码:325 / 336
页数:12
相关论文
共 50 条
  • [21] Information fusion in visual question answering: A Survey
    Zhang, Dongxiang
    Cao, Rui
    Wu, Sai
    INFORMATION FUSION, 2019, 52 : 268 - 280
  • [22] Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI
    Lee, Gyeonggeon
    Zhai, Xiaoming
    TECHTRENDS, 2025, : 271 - 287
  • [23] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
    Hussain, Afzaal
    Maqsood, Ifrah
    Shahzad, Muhammad
    Fraz, Muhammad Moazam
    2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
  • [24] MUREL: Multimodal Relational Reasoning for Visual Question Answering
    Cadene, Remi
    Ben-younes, Hedi
    Cord, Matthieu
    Thome, Nicolas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
  • [25] Multimodal Prompt Retrieval for Generative Visual Question Answering
    Ossowski, Timothy
    Hu, Junjie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 2518 - 2535
  • [26] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
    Mahamoud, Ibrahim Souleiman
    Coustaty, Mickael
    Joseph, Aurelie
    d'Andecy, Vincent Poulain
    Ogier, Jean-Marc
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
  • [27] Application of Multimodal Transformer Model in Intelligent Agricultural Disease Detection and Question-Answering Systems
    Lu, Yuchun
    Lu, Xiaoyi
    Zheng, Liping
    Sun, Min
    Chen, Siyu
    Chen, Baiyan
    Wang, Tong
    Yang, Jiming
    Lv, Chunli
    PLANTS-BASEL, 2024, 13 (07):
  • [28] Multimodal Encoders and Decoders with Gate Attention for Visual Question Answering
    Li, Haiyan
    Han, Dezhi
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (03) : 1023 - 1040
  • [29] Multimodal Local Perception Bilinear Pooling for Visual Question Answering
    Lao, Mingrui
    Guo, Yanming
    Wang, Hui
    Zhang, Xin
    IEEE ACCESS, 2018, 6 : 57923 - 57932
  • [30] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
    Gu, Geonmo
    Kim, Seong Tae
    Ro, Yong Man
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002