Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

被引：4

作者：

Mendonca, Nabor C. ^{[1
]}

机构：

[1] Univ Fortaleza, Postgrad Program Appl Informat, Av Washington Soares, Fortaleza, Ceara, Brazil

来源：

ACM TRANSACTIONS ON COMPUTING EDUCATION | 2024年 / 24卷 / 03期

关键词：

Multimodal generative AI; ChatGPT-4; vision; educational assessment; computer science education;

D O I：

10.1145/3674149

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

The recent integration of visual capabilities into Large Language Models (LLMs) has the potential to play a pivotal role in science and technology education, where visual elements such as diagrams, charts, and tables are commonly used to improve the learning experience. This study investigates the performance of ChatGPT-4 Vision, OpenAI's most advanced visual model at the time the study was conducted, on the Bachelor in Computer Science section of Brazil's 2021 National Undergraduate Exam (ENADE). By presenting the model with the exam's open and multiple-choice questions in their original image format and allowing for reassessment in response to differing answer keys, we were able to evaluate the model's reasoning and self- reflecting capabilities in a large-scale academic assessment involving textual and visual content. ChatGPT-4 Vision significantly outperformed the average exam participant, positioning itself within the top 10 best score percentile. While it excelled in questions that incorporated visual elements, it also encountered challenges with question interpretation, logical reasoning, and visual acuity. A positive correlation between the model's performance in multiple-choice questions and the performance distribution of the human participants suggests multimodal LLMs can provide a useful tool for question testing and refinement. However, the involvement of an independent expert panel to review cases of disagreement between the model and the answer key revealed some poorly constructed questions containing vague or ambiguous statements, calling attention to the critical need for improved question design in future exams. Our findings suggest that while ChatGPT-4 Vision shows promise in multimodal academic evaluations, human oversight remains crucial for verifying the model's accuracy and ensuring the fairness of high-stakes educational exams. The paper's research materials are publicly available at https://github.com/nabormendonca/gpt-4v-enade-cs-2021.

引用

页数：56

共 50 条

[41] Turing's Vision: The Birth of Computer Science
Thomas, Wolfgang
ISIS, 2018, 109 (01) : 213 - 214
[42] ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions
Tassoker, Melek
BMC ORAL HEALTH, 2025, 25 (01):
[43] Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases
Hirosawa, Takanobu
Harada, Yukinori
Mizuta, Kazuya
Sakamoto, Tetsu
Tokumasu, Kazuki
Shimizu, Taro
JMIR FORMATIVE RESEARCH, 2024, 8
[44] The digital transformation of jurisprudence: an evaluation of ChatGPT-4's applicability to solve cases in business law
Schweitzer, Sascha
Conrads, Markus
ARTIFICIAL INTELLIGENCE AND LAW, 2024,
[45] Assessing ChatGPT-4's Capabilities in Generating Dermatology Board Examination Content: An Explorational Study
Shapiro, Jonathan
Lyakhovitsky, Anna
Freud, Tamar
Pavlotsky, Felix
Khamaysi, Ziad
Valdman-Grinshpoun, Yulia
Dodiuk-Gad, Roni
Goldberg, Ilan
Ingber, Arieh
Kaplan, Baruch
Avitan-Hersh, Emily
ACTA DERMATO-VENEREOLOGICA, 2025, 105
[46] Investigating ChatGPT-4's performance in solving physics problems and its potential implications for education
Tong, Dazhen
Tao, Yang
Zhang, Kangkang
Dong, Xinxin
Hu, Yangyang
Pan, Sudong
Liu, Qiaoyi
ASIA PACIFIC EDUCATION REVIEW, 2024, 25 (05) : 1379 - 1389
[47] Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy
Bharatha, Ambadasu
Ojeh, Nkemcho
Rabbi, Ahbab Mohammad Fazle
Campbell, Michael H.
Krishnamurthy, Kandamaran
Layne-Yarde, Rhaheem N. A.
Kumar, Alok
Springer, Dale C. R.
Connell, Kenneth L.
Majumder, Md Anwarul Azim
ADVANCES IN MEDICAL EDUCATION AND PRACTICE, 2024, 15 : 393 - 400
[48] Innovative music education: An empirical assessment of ChatGPT-4's impact on student learning experiences
Zhou, Wang
Kim, Yeajin
EDUCATION AND INFORMATION TECHNOLOGIES, 2024, 29 (16) : 20855 - 20881
[49] Computer Vision System for Evaluating the Schober's Test
Gastelum Strozzi, Alfonso
Padilla Castaneda, Miguel A.
Bernardini, Roch
Perez Lomeli, Juan Salvador
Arambula Cosio, Fernando
Marquez Flores, Jorge
Burgos-Vargas, Ruben
MEDICAL PHYSICS: Fourteenth Mexican Symposium on Medical Physics, 2016, 1747
[50] A Comparative Analysis of ChatGPT-4, Microsoft's Bing and Google's Bard at Answering Rheumatology Clinical Questions
Yingchoncharoen, Pitchaporn
Chaisrimaneepan, Nattanicha
Pangkanon, Watsachon
Thongpiya, Jerapas
ARTHRITIS & RHEUMATOLOGY, 2024, 76 : 2654 - 2655

← 1 2 3 4 5 →