Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

被引：10

作者：

Moore, Steven ^{[1
]}

Nguyen, Huy A. ^{[1
]}

Chen, Tianying ^{[1
]}

Stamper, John ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

RESPONSIVE AND SUSTAINABLE EDUCATIONAL FUTURES, EC-TEL 2023 | 2023年 / 14200卷

关键词：

Question evaluation; Question quality; Rule-based; GPT-4; ITEM WRITING FLAWS;

D O I：

10.1007/978-3-031-42682-7_16

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existingmethods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.

引用

页码：229 / 245

页数：17

共 50 条

[31] Assessing Skills of Identifying Variables and Formulating Hypotheses Using Scenario-Baseu Multiple-Choice Questions
Temiz, Burak Kagan
INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, 2020, 7 (01): : 1 - 17
[32] Differences in Multiple-Choice Questions of Opposite Stem Orientations Based on a Novel Item Quality Measure
Adeosun, Samuel Olusegun
AMERICAN JOURNAL OF PHARMACEUTICAL EDUCATION, 2023, 87 (02) : 256 - 264
[33] Rule-Based Methods for ECG Quality Control
Moody, Benjamin E.
2011 COMPUTING IN CARDIOLOGY, 2011, 38 : 361 - 363
[34] Uncovering the Effects of Genes, Proteins, and Medications on Functions of Wound Healing: A Dependency Rule-Based Text Mining Approach Leveraging GPT-4 based Evaluation
Jui, Jayati H.
Hauskrecht, Milos
2023 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS, BHI, 2023,
[35] Categorized and Correlated Multiple-Choice Questions: A Tool for Assessing Comprehensive Physics Knowledge of Students
Siddiqui, Shabnam
EDUCATION SCIENCES, 2022, 12 (09):
[36] TreeQuestion: Assessing Conceptual Learning Outcomes with LLM-Generated Multiple-Choice Questions
Cheng, Zirui
Jingfei, X.U.
Haojian, J.I.N.
Proceedings of the ACM on Human-Computer Interaction, 2024, 8 (CSCW2)
[37] Assessing the Benefit of Student Self-Generated Multiple-Choice Questions on Examination Performance
Geiger, Marshall A.
Middleton, Mary M.
Tahseen, Maryam
ISSUES IN ACCOUNTING EDUCATION, 2021, 36 (02): : 1 - 20
[38] The Role of Faculty Development in Improving the Quality of Multiple-Choice Questions in Dental Education
Shaikh, Saleem
Kannan, S. Karthiga
Naqvi, Zuber Ahamed
Pasha, Zameer
Ahamad, Mazood
JOURNAL OF DENTAL EDUCATION, 2020, 84 (03) : 316 - 322
[39] COMPARISONS OF PERFORMANCE IN PHARMACOLOGY EXAMINATIONS USING MULTIPLE-CHOICE AND ESSAY QUESTIONS
HOULT, JRS
BRITISH JOURNAL OF PHARMACOLOGY, 1976, 58 (02) : P306 - P307
[40] Developing High-Quality Multiple-Choice Questions for Assessment in Legal Education
Case, Susan M.
Donahue, Beth E.
JOURNAL OF LEGAL EDUCATION, 2008, 58 (03) : 372 - 387

← 1 2 3 4 5 →