Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI

被引：0

作者：

Lee, Gyeonggeon ^{[1
,2
]}

Zhai, Xiaoming ^{[2
,3
,4
]}

机构：

[1] Natl Inst Educ, Nat Sci & Sci Educ Dept, Nat Sci & Sci Educ, 1 Nanyang Walk, Singapore 637616, Singapore

[2] Univ Georgia, AI4STEM Educ Ctr, 110 Carlton St, Athens, GA 30602 USA

[3] Univ Georgia, Natl GENIUS Ctr, 110 Carlton St, Athens, GA 30602 USA

[4] Univ Georgia, Dept Math Sci & Social Studies Educ, 110 Carlton St, Athens, GA 30602 USA

来源：

TECHTRENDS | 2025年

基金：

美国国家科学基金会;

关键词：

Artificial intelligence (AI); GPT-4V(ision); Visual question answering; Vision language model; Multimodality;

D O I：

10.1007/s11528-024-01035-z

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

Educators and researchers have analyzed various image data acquired from teaching and learning, such as images of learning materials, classroom dynamics, students' drawings, etc. However, this approach is labour-intensive and time-consuming, limiting its scalability and efficiency. The recent development in the Visual Question Answering (VQA) technique has streamlined this process by allowing users to posing questions about the images and receive accurate and automatic answers, both in natural language, thereby enhancing efficiency and reducing the time required for analysis. State-of-the-art Vision Language Models (VLMs) such as GPT-4V(ision) have extended the applications of VQA to a wide range of educational purposes. This report employs GPT-4V as an example to demonstrate the potential of VLM in enabling and advancing VQA for education. Specifically, we demonstrated that GPT-4V enables VQA for educational scholars without requiring technical expertise, thereby reducing accessibility barriers for general users. In addition, we contend that GPT-4V spotlights the transformative potential of VQA for educational research, representing a milestone accomplishment for visual data analysis in education.

引用

页码：271 / 287

页数：17

共 50 条

[31] Dual-Key Multimodal Backdoors for Visual Question Answering
Walmer, Matthew
Sikka, Karan
Sur, Indranil
Shrivastava, Abhinav
Jha, Susmit
arXiv, 2021,
[32] Improving Visual Question Answering by Multimodal Gate Fusion Network
Xiang, Shenxiang
Chen, Qiaohong
Fang, Xian
Guo, Menghao
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[33] Putting ChatGPT vision (GPT-4V) to the test: risk perception in traffic images
Driessen, Tom
Dodou, Dimitra
Bazilinskyy, Pavlo
de Winter, Joost
ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (05):
[34] Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
Zhang, Chenhui
Wang, Sherrie
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 7839 - 7849
[35] Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis
Jalili, Jalil
Jiravarnsirikul, Anuwat
Bowd, Christopher
Chuter, Benton
Belghith, Akram
Goldbaum, Michael H.
Baxter, Sally L.
Weinreb, Robert N.
Zangwill, Linda M.
Christopher, Mark
OPHTHALMOLOGY SCIENCE, 2025, 5 (02):
[36] GPT-4V with emotion: A zero-shot benchmark for Generalized Emotion Recognition
Lian, Zheng
Sun, Licai
Sun, Haiyang
Chen, Kang
Wen, Zhuofan
Gu, Hao
Liu, Bin
Tao, Jianhua
INFORMATION FUSION, 2024, 108
[37] Advancements in AI for Gastroenterology Education: An Assessment of OpenAI's GPT-4 and GPT-3.5 in MKSAP Question Interpretation
Patel, Akash
Samreen, Isha
Ahmed, Imran
AMERICAN JOURNAL OF GASTROENTEROLOGY, 2024, 119 (10S): : S1580 - S1580
[38] Revolution or risk?-Assessing the potential and challenges of GPT-4V in radiologic image interpretation
Huppertz, Marc Sebastian
Siepmann, Robert
Topp, David
Nikoubashman, Omid
Yueksel, Can
Kuhl, Christiane Katharina
Truhn, Daniel
Nebelung, Sven
EUROPEAN RADIOLOGY, 2025, 35 (03) : 1111 - 1121
[39] Visual Experience-Based Question Answering with Complex Multimodal Environments
Kim, Incheol
MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
[40] Multimodal Encoder-Decoder Attention Networks for Visual Question Answering
Chen, Chongqing
Han, Dezhi
Wang, Jun
IEEE ACCESS, 2020, 8 : 35662 - 35671

← 1 2 3 4 5 →