Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

被引:4
|
作者
Mendonca, Nabor C. [1 ]
机构
[1] Univ Fortaleza, Postgrad Program Appl Informat, Av Washington Soares, Fortaleza, Ceara, Brazil
来源
关键词
Multimodal generative AI; ChatGPT-4; vision; educational assessment; computer science education;
D O I
10.1145/3674149
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The recent integration of visual capabilities into Large Language Models (LLMs) has the potential to play a pivotal role in science and technology education, where visual elements such as diagrams, charts, and tables are commonly used to improve the learning experience. This study investigates the performance of ChatGPT-4 Vision, OpenAI's most advanced visual model at the time the study was conducted, on the Bachelor in Computer Science section of Brazil's 2021 National Undergraduate Exam (ENADE). By presenting the model with the exam's open and multiple-choice questions in their original image format and allowing for reassessment in response to differing answer keys, we were able to evaluate the model's reasoning and self- reflecting capabilities in a large-scale academic assessment involving textual and visual content. ChatGPT-4 Vision significantly outperformed the average exam participant, positioning itself within the top 10 best score percentile. While it excelled in questions that incorporated visual elements, it also encountered challenges with question interpretation, logical reasoning, and visual acuity. A positive correlation between the model's performance in multiple-choice questions and the performance distribution of the human participants suggests multimodal LLMs can provide a useful tool for question testing and refinement. However, the involvement of an independent expert panel to review cases of disagreement between the model and the answer key revealed some poorly constructed questions containing vague or ambiguous statements, calling attention to the critical need for improved question design in future exams. Our findings suggest that while ChatGPT-4 Vision shows promise in multimodal academic evaluations, human oversight remains crucial for verifying the model's accuracy and ensuring the fairness of high-stakes educational exams. The paper's research materials are publicly available at https://github.com/nabormendonca/gpt-4v-enade-cs-2021.
引用
收藏
页数:56
相关论文
共 50 条
  • [21] Evaluating ChatGPT-4's correctness in patient-focused informing and awareness for atrial fi brillation
    Zeljkovic, Ivan
    Novak, Matea
    Jordan, Ana
    Lisicic, Ante
    Nemeth-Blazic, Tatjana
    Pavlovic, Nikola
    Manola, Sime
    HEART RHYTHM O2, 2025, 6 (01): : 58 - 63
  • [22] Evaluating ChatGPT-4 for the Interpretation of Images from Several Diagnostic Techniques in Gastroenterology
    Saraiva, Miguel Mascarenhas
    Ribeiro, Tiago
    Agudo, Belen
    Afonso, Joao
    Mendes, Francisco
    Martins, Miguel
    Cardoso, Pedro
    Mota, Joana
    Almeida, Maria Joao
    Costa, Antonio
    Gonzalez Haba Ruiz, Mariano
    Widmer, Jessica
    Moura, Eduardo
    Javed, Ahsan
    Manzione, Thiago
    Nadal, Sidney
    Barroso, Luis F.
    de Parades, Vincent
    Ferreira, Joao
    Macedo, Guilherme
    JOURNAL OF CLINICAL MEDICINE, 2025, 14 (02)
  • [23] The Potential of Using ChatGPT-4 Vision for Detecting Image Manipulation in Academic Medicine Articles
    Zhu, Lingxuan
    Zhang, Haoran
    Luo, Peng
    ACADEMIC MEDICINE, 2024, 99 (12) : 1320 - 1321
  • [24] Integrating Mobile Robotics and Vision With Undergraduate Computer Science
    Cielniak, Grzegorz
    Bellotto, Nicola
    Duckett, Tom
    IEEE TRANSACTIONS ON EDUCATION, 2013, 56 (01) : 48 - 53
  • [25] Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study
    Wang, Ying-Mei
    Shen, Hung-Wei
    Chen, Tzeng-Ji
    Chiang, Shu-Chiung
    Lin, Ting-Guan
    JMIR MEDICAL EDUCATION, 2025, 11
  • [26] AI Versus MD: Evaluating the surgical decision-making accuracy of ChatGPT-4
    Palenzuela, Deanna L.
    Mullen, John T.
    Phitayakorn, Roy
    SURGERY, 2024, 176 (02) : 241 - 245
  • [27] Evaluating the performance and clinical decision-making impact of ChatGPT-4 in reproductive medicine
    Chen, Rouzhu
    Zeng, Danling
    Li, Yi
    Huang, Rui
    Sun, Dejuan
    Li, Tingting
    INTERNATIONAL JOURNAL OF GYNECOLOGY & OBSTETRICS, 2025, 168 (03) : 1285 - 1291
  • [28] It′s not like Jarvis, but it′s pretty close!" - Examining ChatGPT′s Usage among Undergraduate Students in Computer Science
    Budhiraja, Ritvik
    Joshi, Ishika
    Akolekar, Harshal
    Challa, Jagat Sesh
    Kumar, Dhruv
    PROCEEDINGS OF THE 26TH AUSTRALASIAN COMPUTING EDUCATION CONFERENCE, ACE 2024, 2024, : 124 - 133
  • [29] Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images
    Alryalat, Saif Aldeen
    Musleh, Ayman Mohammed
    Kahook, Malik Y.
    FRONTIERS IN OPHTHALMOLOGY, 2024, 4
  • [30] A Study of the UK Undergraduate Computer Science Curriculum: A Vision of Cybersecurity
    Ruiz, Rodrigo
    PROCEEDINGS OF 2019 IEEE 12TH INTERNATIONAL CONFERENCE ON GLOBAL SECURITY, SAFETY AND SUSTAINABILITY (ICGS3-2019), 2019, : 180 - 187