Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

被引:10
|
作者
Moore, Steven [1 ]
Nguyen, Huy A. [1 ]
Chen, Tianying [1 ]
Stamper, John [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Question evaluation; Question quality; Rule-based; GPT-4; ITEM WRITING FLAWS;
D O I
10.1007/978-3-031-42682-7_16
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existingmethods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.
引用
收藏
页码:229 / 245
页数:17
相关论文
共 50 条
  • [1] From GPT-3 to GPT-4: On the Evolving Efficacy of LLMs to Answer Multiple-Choice Questions for Programming Classes in Higher Education
    Savelka, Jaromir
    Agarwal, Arav
    Bogart, Christopher
    Sakr, Majd
    COMPUTER SUPPORTED EDUCATION, CSEDU 2023, 2024, 2052 : 160 - 182
  • [2] Assessing declarative and procedural knowledge using multiple-choice questions
    Abu-Zaid, Ahmed
    Khan, Tehreem A.
    MEDICAL EDUCATION ONLINE, 2013, 18
  • [3] Quality assessment of multiple-choice questions
    Akram, Zareena
    Aman, Khaliq
    Waheed, Nomira
    Rahim, Shiekh Kashif
    Hanif, Fouzia
    Abdullah, Zainab
    RAWAL MEDICAL JOURNAL, 2025, 50 (01): : 191 - 194
  • [4] GPT-4 generated answer rationales to multiple choice assessment questions in undergraduate medical education
    Ch'en, Peter Y.
    Day, Wesley
    Pekson, Ryan C.
    Barrientos, Juan
    Burton, William B.
    Ludwig, Allison B.
    Jariwala, Sunit P.
    Cassese, Todd
    BMC MEDICAL EDUCATION, 2025, 25 (01)
  • [5] Using cognitive models to develop quality multiple-choice questions
    Pugh, Debra
    De Champlain, Andre
    Gierl, Mark
    Lai, Hollis
    Touchie, Claire
    MEDICAL TEACHER, 2016, 38 (08) : 838 - 843
  • [6] Inconsistently Accurate: Repeatability of GPT-3.5 and GPT-4 in Answering Radiology Board-style Multiple Choice Questions
    Ballard, David H.
    RADIOLOGY, 2024, 311 (02)
  • [7] QUALITY AND FEATURE OF MULTIPLE-CHOICE QUESTIONS IN EDUCATION
    Jia, Bing
    He, Dan
    Zhu, Zhemin
    PROBLEMS OF EDUCATION IN THE 21ST CENTURY, 2020, 78 (04) : 576 - 594
  • [8] Assessing the quality of automatic-generated short answers using GPT-4
    Rodrigues L.
    Dwan Pereira F.
    Cabral L.
    Gašević D.
    Ramalho G.
    Ferreira Mello R.
    Computers and Education: Artificial Intelligence, 2024, 7
  • [9] Assessing the Quality of Multiple-Choice Test Items
    Clifton, Sandra L.
    Schriner, Cheryl L.
    NURSE EDUCATOR, 2010, 35 (01) : 12 - 16
  • [10] Information Quality in The Analysis of Multiple-Choice Questions on An Example of Assessing the Importance of The European Union
    Stanimir, Agnieszka
    EDUCATION EXCELLENCE AND INNOVATION MANAGEMENT: A 2025 VISION TO SUSTAIN ECONOMIC DEVELOPMENT DURING GLOBAL CHALLENGES, 2020, : 1410 - 1428