Learning to Reuse Distractors to Support Multiple-Choice Question Generation in Education

被引:6
|
作者
Bitew, Semere Kiros [1 ]
Hadifar, Amir [1 ]
Sterckx, Lucas [2 ]
Deleu, Johannes [1 ]
Develder, Chris [1 ]
Demeester, Thomas [1 ]
机构
[1] Ghent Univ imec, Internet Technol & Data Sci Lab, Text to Knowledge Team, B-9052 Ghent, Belgium
[2] LynxCare, B-3000 Leuven, Belgium
关键词
Context modeling; Task analysis; Semantics; Agricultural machinery; Vocabulary; Guidelines; Benchmark testing; Distractor generation; multiple-choice question (MCQ); natural language processing (NLP); online learning; transformers;
D O I
10.1109/TLT.2022.3226523
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiple-choice questions (MCQs) are widely used in digital learning systems, as they allow for automating the assessment process. However, owing to the increased digital literacy of students and the advent of social media platforms, MCQ tests are widely shared online, and teachers are continuously challenged to create new questions, which is an expensive and time-consuming task. A particularly sensitive aspect of MCQ creation is to devise relevant distractors, i.e., wrong answers that are not easily identifiable as being wrong. This article studies how a large existing set of manually created answers and distractors for questions over a variety of domains, subjects, and languages can be leveraged to help teachers in creating new MCQs, by the smart reuse of existing distractors. We built several data-driven models based on context-aware question and distractor representations and compared them with static feature-based models. The proposed models are evaluated with automated metrics and in a realistic user test with teachers. Both automatic and human evaluations indicate that context-aware models consistently outperform a static feature-based approach. For our best-performing context-aware model, on average, three distractors out of the ten shown to teachers were rated as high-quality distractors. We create a performance benchmark, and make it public, to enable comparison between different approaches and to introduce a more standardized evaluation of the task. The benchmark contains a test of 298 educational questions covering multiple subjects and languages and a 77k multilingual pool of distractor vocabulary for future research.
引用
收藏
页码:375 / 390
页数:16
相关论文
共 50 条