Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations

被引：30

作者：

Reeves, Brent ^{[1
]}

Sarsa, Sami ^{[2
]}

Prather, James ^{[1
]}

Denny, Paul ^{[3
]}

Becker, Brett A. ^{[4
]}

Hellas, Arto ^{[2
]}

Kimmel, Bailey ^{[1
]}

Powell, Garrett ^{[1
]}

Leinonen, Juho ^{[3
]}

机构：

[1] Abilene Christian Univ, Abilene, TX 79699 USA

[2] Aalto Univ, Espoo, Finland

[3] Univ Auckland, Auckland, New Zealand

[4] Univ Coll Dublin, Dublin, Ireland

来源：

PROCEEDINGS OF THE 2023 CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, ITICSE 2023, VOL 1 | 2023年

关键词：

academic integrity; AI; artificial intelligence; ChatGPT; code generation; code writing; Codex; computer programming; Copilot; CS1; deep learning; generative AI; introductory programming; GitHub; GPT-3; large language models; machine learning; ML; neural networks; natural language processing; novice programming; OpenAI;

D O I：

10.1145/3587102.3588805

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

The recent emergence of code generation tools powered by large language models has attracted wide attention. Models such as OpenAI Codex can take natural language problem descriptions as input and generate highly accurate source code solutions, with potentially significant implications for computing education. Given the many complexities that students face when learning to write code, they may quickly become reliant on such tools without properly understanding the underlying concepts. One popular approach for scaffolding the code writing process is to use Parsons problems, which present solution lines of code in a scrambled order. These remove the complexities of low-level syntax, and allow students to focus on algorithmic and design-level problem solving. It is unclear how well code generation models can be applied to solve Parsons problems, given the mechanics of these models and prior evidence that they underperform when problems include specific restrictions. In this paper, we explore the performance of the Codex model for solving Parsons problems over various prompt variations. Using a corpus of Parsons problems we sourced from the computing education literature, we find that Codex successfully reorders the problem blocks about half of the time, a much lower rate of success when compared to prior work on more free-form programming tasks. Regarding prompts, we find that small variations in prompting have a noticeable effect on model performance, although the effect is not as pronounced as between different problems.

引用

页码：299 / 305

页数：7

共 38 条

[31] Evaluating CZTS Solar Cell Performance Based on Generation and Recombination Models for Possible ETLs Through Numerical Analysis
Dakua, Pratap Kumar
Dash, Rajib Kumar
Laidouci, Abdelmoumene
Bhattarai, Sagar
Dudekula, Usen
Kashyap, Savita
Agarwal, Vipul
Rashed, Ahmed Nabih Zaki
JOURNAL OF ELECTRONIC MATERIALS, 2024, 53 (04) : 2015 - 2025
[32] Approach to constructing a system of mathematical models in solving long-term planning problems for small-scale water bodies
D. M. Yaroshevskii
Water Resources, 2011, 38 : 257 - 260
[33] Approach to constructing a system of mathematical models in solving long-term planning problems for small-scale water bodies
Yaroshevskii, D. M.
WATER RESOURCES, 2011, 38 (02) : 257 - 260
[34] Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system
Zhong-Sheng Chen
Qun-Xiong Zhu
Yuan Xu
Yan-Lin He
Qing-Lin Su
Yiqing C. Liu
Zoltan K. Nagy
Soft Computing, 2021, 25 : 6489 - 6504
[35] Integrating virtual sample generation with input-training neural network for solving small sample size problems: application to purified terephthalic acid solvent system
Chen, Zhong-Sheng
Zhu, Qun-Xiong
Xu, Yuan
He, Yan-Lin
Su, Qing-Lin
Liu, Yiqing C.
Nagy, Zoltan K.
SOFT COMPUTING, 2021, 25 (08) : 6489 - 6504
[36] Evaluating the performance of static versus dynamic models of credit default: evidence from long-term Small Business Administration-guaranteed loans
Glennon, Dennis
Nigro, Peter
JOURNAL OF CREDIT RISK, 2011, 7 (02): : 3 - 35
[37] Evaluating Mathematical Problem-Solving Abilities of Generative AI Models: Performance Analysis of o1-preview and <sc>gpt-4o</sc> Using the Korean College Scholastic Ability Test
Oh, Sejun
IEEE ACCESS, 2025, 13 : 1227 - 1235
[38] Can bio-inspired optimization algorithms be used to further improve the collective computing performance? Comment on review article "Does being multi-headed make you better at solving problems? A survey of Physarum-based models and computations" by Chao Gao et al.
Xia, Chengyi
Huang, Jiechen
PHYSICS OF LIFE REVIEWS, 2019, 29 : 48 - 50

← 1 2 3 4 →