Feedback-Generation for Programming Exercises With GPT-4

被引:6
|
作者
Azaiz, Imen [1 ]
Kiesler, Natalie [2 ]
Strickroth, Sven [1 ]
机构
[1] Ludwig Maximilians Univ Munchen, Munich, Germany
[2] Nuremberg Tech, Nurnberg, Germany
关键词
formative feedback; personalized feedback; assessment; introductory programming; Large Language Models; LLMs; GPT-4; Turbo; benchmarking;
D O I
10.1145/3649217.3653594
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.
引用
收藏
页码:31 / 37
页数:7
相关论文
共 50 条
  • [21] GPT-4 Performance for Neurologic Localization
    Lee, Jung-Hyun
    Choi, Eunhee
    McDougal, Robert
    Lytton, William W.
    NEUROLOGY-CLINICAL PRACTICE, 2024, 14 (03)
  • [22] ChatGPT/GPT-4 and Spinal Surgeons
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (08) : 1657 - 1657
  • [23] Is GPT-4 a Good Data Analyst?
    Cheng, Liying
    Li, Xingxuan
    Bing, Lidong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9496 - 9514
  • [24] PathOCL: Path-Based Prompt Augmentation for OCL Generation with GPT-4
    Abukhalaf, Seif
    Hamdaqa, Mohammad
    Khomh, Foutse
    PROCEEDINGS 2024 IEEE/ACM FIRST INTERNATIONAL CONFERENCE ON AI FOUNDATION MODELS AND SOFTWARE ENGINEERING, FORGE 2024, 2024, : 108 - 118
  • [25] Dynamic Reconfiguring of GPT-4 Based Tutors to Become GPT-4 Based Teachers in Underserved Areas in Africa and the Environs
    Butgereit, Laurie
    Abugosseisa, Muna Mahmoud
    Elbashir, Mohammed
    International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2024, 2024,
  • [26] GPT-4 in Radiology: Improvements in Advanced Reasoning
    Bhayana, Rajesh
    Bleakney, Robert R.
    Krishna, Satheesh
    RADIOLOGY, 2023, 307 (05)
  • [27] Performance of GPT-4 on Chinese Nursing Examination
    Miao, Yiqun
    Luo, Yuan
    Zhao, Yuhan
    Li, Jiawei
    Liu, Mingxuan
    Wang, Huiying
    Chen, Yuling
    Wu, Ying
    NURSE EDUCATOR, 2024, 49 (06) : E338 - E343
  • [28] A Systematic Literature Review of Automated Feedback Generation for Programming Exercises
    Keuning, Hieke
    Jeuring, Johan
    Heeren, Bastiaan
    ACM TRANSACTIONS ON COMPUTING EDUCATION, 2019, 19 (01):
  • [29] Automated Financial Analysis Using GPT-4
    Noels, Sander
    Merlevede, Adriaan
    Fecheyr, Andrew
    Vanhalst, Maarten
    Meerlaen, Nick
    Viaene, Sebastien
    De Bie, Tijl
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VII, 2023, 14175 : 345 - 349
  • [30] Using GPT-4 to Generate Failure Logic
    Clegg, Kester
    Habli, Ibrahim
    McDermid, John
    COMPUTER SAFETY, RELIABILITY, AND SECURITY. SAFECOMP 2024 WORKSHOPS, 2024, 14989 : 148 - 159