Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

被引:10
|
作者
Moore, Steven [1 ]
Nguyen, Huy A. [1 ]
Chen, Tianying [1 ]
Stamper, John [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Question evaluation; Question quality; Rule-based; GPT-4; ITEM WRITING FLAWS;
D O I
10.1007/978-3-031-42682-7_16
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existingmethods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.
引用
收藏
页码:229 / 245
页数:17
相关论文
共 50 条
  • [21] Evaluating the Quality of Multiple-Choice Test Questions in the Postlicensure Environment
    Makhija, Hirsh
    Schneid, Stephen D.
    Kalinowski, Amy
    Mandel, Jess
    Davidson, Judy E.
    JOURNAL OF CONTINUING EDUCATION IN NURSING, 2024, 55 (10): : 487 - 492
  • [22] Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods
    Bhattarai, Kriti
    Oh, Inez Y.
    Sierra, Jonathan Moran
    Tang, Jonathan
    Payne, Philip R. O.
    Abrams, Zach
    Lai, Albert M.
    JAMIA OPEN, 2024, 7 (03)
  • [23] Assessing information literacy using multiple-choice questionnaires
    Beutelspacher, Lisa
    INFORMATION-WISSENSCHAFT UND PRAXIS, 2014, 65 (06): : 341 - 352
  • [24] Evaluation of the quality of multiple-choice questions according to the students' academic level
    Inarrairaegui, Mercedes
    Fernandez-Ros, Nerea
    Lucena, Felipe
    Landecho, Manuel F.
    Garcia, Nicolas
    Quiroga, Jorge
    Ignacio Herrero, Jose
    BMC MEDICAL EDUCATION, 2022, 22 (01)
  • [25] Strategies Used to Improve Quality of Multiple-Choice Questions: A Systematic Review
    Hye, Tanvirul
    Roni, Monzurul
    JOURNAL OF PHARMACOLOGY AND EXPERIMENTAL THERAPEUTICS, 2023, 385
  • [26] Evaluation of the quality of multiple-choice questions according to the students’ academic level
    Mercedes Iñarrairaegui
    Nerea Fernández-Ros
    Felipe Lucena
    Manuel F. Landecho
    Nicolás García
    Jorge Quiroga
    Jose Ignacio Herrero
    BMC Medical Education, 22
  • [27] Automated Assessment with Multiple-choice Questions using Weighted Answers
    Zampirolli, Francisco de Assis
    Batista, Valerio Ramos
    Rodriguez, Carla
    da Rocha, Rafaela Vilela
    Goya, Denise
    CSEDU: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED EDUCATION - VOL 1, 2021, : 254 - 261
  • [28] CONSTRUCTED RESPONSE OR MULTIPLE-CHOICE QUESTIONS FOR ASSESSING DECLARATIVE PROGRAMMING KNOWLEDGE? THAT IS THE QUESTION!
    Belo, Yolanda
    Moro, Sergio
    Martins, Antonio
    Ramos, Pedro
    Costa, Joana Martinho
    Esmerado, Joaquim
    JOURNAL OF INFORMATION TECHNOLOGY EDUCATION-INNOVATIONS IN PRACTICE, 2019, 18 : 153 - 170
  • [29] Does Educator Training or Experience Affect the Quality of Multiple-Choice Questions?
    Webb, Emily M.
    Phuong, Jonathan S.
    Naeger, David M.
    ACADEMIC RADIOLOGY, 2015, 22 (10) : 1317 - 1322
  • [30] DEMONSTRATION OF A WINDOWS-BASED PROGRAM FOR MULTIPLE-CHOICE QUESTIONS
    WHELPTON, R
    TROUT, SJ
    BRITISH JOURNAL OF PHARMACOLOGY, 1994, 112 : U222 - U222