Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI

被引:0
|
作者
Lee, Gyeonggeon [1 ,2 ]
Zhai, Xiaoming [2 ,3 ,4 ]
机构
[1] Natl Inst Educ, Nat Sci & Sci Educ Dept, Nat Sci & Sci Educ, 1 Nanyang Walk, Singapore 637616, Singapore
[2] Univ Georgia, AI4STEM Educ Ctr, 110 Carlton St, Athens, GA 30602 USA
[3] Univ Georgia, Natl GENIUS Ctr, 110 Carlton St, Athens, GA 30602 USA
[4] Univ Georgia, Dept Math Sci & Social Studies Educ, 110 Carlton St, Athens, GA 30602 USA
基金
美国国家科学基金会;
关键词
Artificial intelligence (AI); GPT-4V(ision); Visual question answering; Vision language model; Multimodality;
D O I
10.1007/s11528-024-01035-z
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Educators and researchers have analyzed various image data acquired from teaching and learning, such as images of learning materials, classroom dynamics, students' drawings, etc. However, this approach is labour-intensive and time-consuming, limiting its scalability and efficiency. The recent development in the Visual Question Answering (VQA) technique has streamlined this process by allowing users to posing questions about the images and receive accurate and automatic answers, both in natural language, thereby enhancing efficiency and reducing the time required for analysis. State-of-the-art Vision Language Models (VLMs) such as GPT-4V(ision) have extended the applications of VQA to a wide range of educational purposes. This report employs GPT-4V as an example to demonstrate the potential of VLM in enabling and advancing VQA for education. Specifically, we demonstrated that GPT-4V enables VQA for educational scholars without requiring technical expertise, thereby reducing accessibility barriers for general users. In addition, we contend that GPT-4V spotlights the transformative potential of VQA for educational research, representing a milestone accomplishment for visual data analysis in education.
引用
收藏
页码:271 / 287
页数:17
相关论文
共 50 条
  • [1] Evaluation of Multimodal ChatGPT (GPT-4V) in Describing Mammography Image Features
    Haver, Hana
    Bahl, Manisha
    Doo, Florence
    Kamel, Peter
    Parekh, Vishwa
    Jeudy, Jean
    Yi, Paul
    CANADIAN ASSOCIATION OF RADIOLOGISTS JOURNAL-JOURNAL DE L ASSOCIATION CANADIENNE DES RADIOLOGISTES, 2024, 75 (04): : 947 - 949
  • [2] GPT-4V passes the BLS and ACLS examinations: An analysis of GPT-4V's image recognition capabilities
    King, Ryan C.
    Bharani, Vishnu
    Shah, Kunal
    Yeo, Yee Hui
    Samaan, Jamil S.
    RESUSCITATION, 2024, 195
  • [3] GPT-4V(ision) for Robotics: Multimodal Task Planning From Human Demonstration
    Wake, Naoki
    Kanehira, Atsushi
    Sasabuchi, Kazuhiro
    Takamatsu, Jun
    Ikeuchi, Katsushi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (11): : 10567 - 10574
  • [4] Map Reading and Analysis with GPT-4V(ision)
    Xu, Jinwen
    Tao, Ran
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (04)
  • [5] EduVQA: A multimodal Visual Question Answering framework for smart education
    Xiao, Jiongen
    Zhang, Zifeng
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 122 : 615 - 624
  • [6] Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study
    Noda, Masao
    Ueno, Takayoshi
    Koshu, Ryota
    Takaso, Yuji
    Shimada, Mari Dias
    Saito, Chizu
    Sugimoto, Hisashi
    Fushiki, Hiroaki
    Ito, Makoto
    Nomura, Akihiro
    Yoshizaki, Tomokazu
    JMIR MEDICAL EDUCATION, 2024, 10
  • [7] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [8] Impact of Multimodal Prompt Elements on Diagnostic Performance of GPT-4V in Challenging Brain MRI Cases
    Schramm, Severin
    Preis, Silas
    Metz, Marie-Christin
    Jung, Kirsten
    Schmitz-Koep, Benita
    Zimmer, Claus
    Wiestler, Benedikt
    Hedderich, Dennis M.
    Kim, Su Hwan
    RADIOLOGY, 2025, 314 (01)
  • [9] Unveiling the clinical incapabilities: a benchmarking study of GPT-4V(ision) for ophthalmic multimodal image analysis
    Xu, Pusheng
    Chen, Xiaolan
    Zhao, Ziwei
    Shi, Danli
    BRITISH JOURNAL OF OPHTHALMOLOGY, 2024, 108 (10) : 1384 - 1389
  • [10] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30