Feedback-Generation for Programming Exercises With GPT-4

被引：6

作者：

Azaiz, Imen ^{[1
]}

Kiesler, Natalie ^{[2
]}

Strickroth, Sven ^{[1
]}

机构：

[1] Ludwig Maximilians Univ Munchen, Munich, Germany

[2] Nuremberg Tech, Nurnberg, Germany

来源：

PROCEEDINGS OF THE 2024 CONFERENCE INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, VOL 1, ITICSE 2024 | 2024年

关键词：

formative feedback; personalized feedback; assessment; introductory programming; Large Language Models; LLMs; GPT-4; Turbo; benchmarking;

D O I：

10.1145/3649217.3653594

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.

引用

页码：31 / 37

页数：7

共 50 条

[1] Utilizing OpenAI's GPT-4 for written feedback
Carlson, Makenna
Pack, Austin
Escalante, Juan
TESOL JOURNAL, 2024, 15 (02)
[2] Leveraging Lecture Content for Improved Feedback: Explorations with GPT-4 and Retrieval Augmented Generation
Jacobs, Sven
Jaschke, Steffen
2024 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING EDUCATION AND TRAINING, CSEE & T 2024, 2024,
[3] Exploring GPT-4 as MR Sequence and Reconstruction Programming Assistant GPT4MR
Zaiss, Moritz
Rajput, Junaid R.
Dang, Hoai N.
Golkov, Vladimir
Cremers, Daniel
Knoll, Florian
Maier, Andreas
BILDVERARBEITUNG FUR DIE MEDIZIN 2024, 2024, : 94 - 99
[4] Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation
Phung, Tung
Padurean, Victor-Alexandru
Singh, Anjali
Brooks, Christopher
Cambronero, Jose
Gulwani, Sumit
FOURTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2024, 2024, : 12 - 23
[5] Evaluating GPT-4 on Impressions Generation in Radiology Reports
Sun, Zhaoyi
Ong, Hanley
Kennedy, Patrick
Tang, Liyan
Chen, Shirley
Elias, Jonathan
Lucas, Eugene
Shih, George
Peng, Yifan
RADIOLOGY, 2023, 307 (05)
[6] Unveiling the Role of GPT-4 in Solving LeetCode Programming Problems
Vishnu, Sarthak
Sahil
Garg, Naman
COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2025, 33 (01)
[7] Using GPT-4 to Provide Tiered, Formative Code Feedback
Ha Nguyen
Allan, Vicki
PROCEEDINGS OF THE 55TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE 2024, VOL. 1, 2024, : 958 - 964
[8] Prompting GPT-4 to support automatic safety case generation
Sivakumar, Mithila
Belle, Alvine B.
Shan, Jinjun
Shahandashti, Kimya Khakzad
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[9] Is GPT-4 a reliable rater? Evaluating consistency in GPT-4's text ratings
Hackl, Veronika
Mueller, Alexandra Elena
Granitzer, Michael
Sailer, Maximilian
FRONTIERS IN EDUCATION, 2023, 8
[10] GPT-4 as a biomedical simulator
Schaefer M.
Reichl S.
ter Horst R.
Nicolas A.M.
Krausgruber T.
Piras F.
Stepper P.
Bock C.
Samwald M.
Computers in Biology and Medicine, 2024, 178

← 1 2 3 4 5 →