Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

被引:4
|
作者
Mendonca, Nabor C. [1 ]
机构
[1] Univ Fortaleza, Postgrad Program Appl Informat, Av Washington Soares, Fortaleza, Ceara, Brazil
来源
关键词
Multimodal generative AI; ChatGPT-4; vision; educational assessment; computer science education;
D O I
10.1145/3674149
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The recent integration of visual capabilities into Large Language Models (LLMs) has the potential to play a pivotal role in science and technology education, where visual elements such as diagrams, charts, and tables are commonly used to improve the learning experience. This study investigates the performance of ChatGPT-4 Vision, OpenAI's most advanced visual model at the time the study was conducted, on the Bachelor in Computer Science section of Brazil's 2021 National Undergraduate Exam (ENADE). By presenting the model with the exam's open and multiple-choice questions in their original image format and allowing for reassessment in response to differing answer keys, we were able to evaluate the model's reasoning and self- reflecting capabilities in a large-scale academic assessment involving textual and visual content. ChatGPT-4 Vision significantly outperformed the average exam participant, positioning itself within the top 10 best score percentile. While it excelled in questions that incorporated visual elements, it also encountered challenges with question interpretation, logical reasoning, and visual acuity. A positive correlation between the model's performance in multiple-choice questions and the performance distribution of the human participants suggests multimodal LLMs can provide a useful tool for question testing and refinement. However, the involvement of an independent expert panel to review cases of disagreement between the model and the answer key revealed some poorly constructed questions containing vague or ambiguous statements, calling attention to the critical need for improved question design in future exams. Our findings suggest that while ChatGPT-4 Vision shows promise in multimodal academic evaluations, human oversight remains crucial for verifying the model's accuracy and ensuring the fairness of high-stakes educational exams. The paper's research materials are publicly available at https://github.com/nabormendonca/gpt-4v-enade-cs-2021.
引用
收藏
页数:56
相关论文
共 50 条
  • [41] Turing's Vision: The Birth of Computer Science
    Thomas, Wolfgang
    ISIS, 2018, 109 (01) : 213 - 214
  • [42] ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions
    Tassoker, Melek
    BMC ORAL HEALTH, 2025, 25 (01):
  • [43] Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases
    Hirosawa, Takanobu
    Harada, Yukinori
    Mizuta, Kazuya
    Sakamoto, Tetsu
    Tokumasu, Kazuki
    Shimizu, Taro
    JMIR FORMATIVE RESEARCH, 2024, 8
  • [44] The digital transformation of jurisprudence: an evaluation of ChatGPT-4's applicability to solve cases in business law
    Schweitzer, Sascha
    Conrads, Markus
    ARTIFICIAL INTELLIGENCE AND LAW, 2024,
  • [45] Assessing ChatGPT-4's Capabilities in Generating Dermatology Board Examination Content: An Explorational Study
    Shapiro, Jonathan
    Lyakhovitsky, Anna
    Freud, Tamar
    Pavlotsky, Felix
    Khamaysi, Ziad
    Valdman-Grinshpoun, Yulia
    Dodiuk-Gad, Roni
    Goldberg, Ilan
    Ingber, Arieh
    Kaplan, Baruch
    Avitan-Hersh, Emily
    ACTA DERMATO-VENEREOLOGICA, 2025, 105
  • [46] Investigating ChatGPT-4's performance in solving physics problems and its potential implications for education
    Tong, Dazhen
    Tao, Yang
    Zhang, Kangkang
    Dong, Xinxin
    Hu, Yangyang
    Pan, Sudong
    Liu, Qiaoyi
    ASIA PACIFIC EDUCATION REVIEW, 2024, 25 (05) : 1379 - 1389
  • [47] Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom's Taxonomy
    Bharatha, Ambadasu
    Ojeh, Nkemcho
    Rabbi, Ahbab Mohammad Fazle
    Campbell, Michael H.
    Krishnamurthy, Kandamaran
    Layne-Yarde, Rhaheem N. A.
    Kumar, Alok
    Springer, Dale C. R.
    Connell, Kenneth L.
    Majumder, Md Anwarul Azim
    ADVANCES IN MEDICAL EDUCATION AND PRACTICE, 2024, 15 : 393 - 400
  • [48] Innovative music education: An empirical assessment of ChatGPT-4's impact on student learning experiences
    Zhou, Wang
    Kim, Yeajin
    EDUCATION AND INFORMATION TECHNOLOGIES, 2024, 29 (16) : 20855 - 20881
  • [49] Computer Vision System for Evaluating the Schober's Test
    Gastelum Strozzi, Alfonso
    Padilla Castaneda, Miguel A.
    Bernardini, Roch
    Perez Lomeli, Juan Salvador
    Arambula Cosio, Fernando
    Marquez Flores, Jorge
    Burgos-Vargas, Ruben
    MEDICAL PHYSICS: Fourteenth Mexican Symposium on Medical Physics, 2016, 1747
  • [50] A Comparative Analysis of ChatGPT-4, Microsoft's Bing and Google's Bard at Answering Rheumatology Clinical Questions
    Yingchoncharoen, Pitchaporn
    Chaisrimaneepan, Nattanicha
    Pangkanon, Watsachon
    Thongpiya, Jerapas
    ARTHRITIS & RHEUMATOLOGY, 2024, 76 : 2654 - 2655