Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

被引：10

作者：

Moore, Steven ^{[1
]}

Nguyen, Huy A. ^{[1
]}

Chen, Tianying ^{[1
]}

Stamper, John ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

RESPONSIVE AND SUSTAINABLE EDUCATIONAL FUTURES, EC-TEL 2023 | 2023年 / 14200卷

关键词：

Question evaluation; Question quality; Rule-based; GPT-4; ITEM WRITING FLAWS;

D O I：

10.1007/978-3-031-42682-7_16

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existingmethods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.

引用

页码：229 / 245

页数：17

共 50 条

[41] DISTANCE TEACHING USING SELF-MARKING MULTIPLE-CHOICE QUESTIONS
POORE, P
TROPICAL DOCTOR, 1987, 17 (01) : 39 - 41
[42] Using multiple-choice questions to evaluate in-depth learning of economics
Buckles, S
Siegfried, JJ
JOURNAL OF ECONOMIC EDUCATION, 2006, 37 (01): : 48 - 57
[43] Using GPT-4 as a guide during inquiry-based learning
Steinert, Steffen
Avila, Karina E.
Kuhn, Jochen
Kuechemann, Stefan
PHYSICS TEACHER, 2024, 62 (07): : 618 - 619
[44] A ChatGPT Prompt for Writing Case-Based Multiple-Choice Questions
Kiyak, Yavuz Selim
SPANISH JOURNAL OF MEDICAL EDUCATION, 2023, 4 (03): : 98 - 103
[45] Assessing novelty, feasibility and value of creative ideas with an unsupervised approach using GPT-4
Kern, Felix B.
Wu, Chien-Te
Chao, Zenas C.
BRITISH JOURNAL OF PSYCHOLOGY, 2024,
[46] Estimation of Confidence Based on Eye Gaze: an Application to Multiple-choice Questions
Yamada, Kento
Augereau, Olivier
Kise, Koichi
PROCEEDINGS OF THE 2017 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2017 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC '17 ADJUNCT), 2017, : 217 - 220
[47] Ten tips for effective use and quality assurance of multiple-choice questions in knowledge-based assessments
Ali, Kamran
Zahra, Daniel
EUROPEAN JOURNAL OF DENTAL EDUCATION, 2024, 28 (02) : 655 - 662
[48] Assessing multiple-choice questions based on language precision and best practices to promote equity in the Dental Physiology course
Lopez, Marisol
PHYSIOLOGY, 2023, 38
[49] Beyond multiple-choice questions: using casebased learning patient questions to assess clinical reasoning
Ferguson, Kristi J.
MEDICAL EDUCATION, 2006, 40 (11) : 1143 - 1143
[50] Comparison of Electronic Examinations using Adaptive Multiple-choice Questions and Constructed-response Questions
Stavroulakis, Peter J.
Photopoulos, Panagiotis
Ventouras, Errikos
Triantis, Dimos
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED EDUCATION (CSEDU), VOL 1, 2020, : 358 - 365

← 1 2 3 4 5 →