Automated Item Generation: impact of item variants on performance and standard setting

被引:4
|
作者
Westacott, R. [1 ]
Badger, K. [2 ]
Kluth, D. [3 ]
Gurnell, M. [4 ,5 ]
Reed, M. W. R. [6 ]
Sam, A. H. [2 ]
机构
[1] Univ Birmingham, Birmingham Med Sch, Birmingham, England
[2] Imperial Coll London, Imperial Coll, Sch Med, London, England
[3] Univ Edinburgh, Edinburgh Med Sch, Edinburgh, Scotland
[4] Univ Cambridge, Wellcome MRC Inst Metab Sci, Cambridge, England
[5] Cambridge Univ Hosp, NIHR Cambridge Biomed Res Ctr, Cambridge, England
[6] Univ Sussex, Brighton & Sussex Med Sch, Brighton, England
关键词
Assessment; Automated item generation; Multiple choice questions; Standard setting; MULTIPLE-CHOICE QUESTIONS; STUDENTS; QUALITY; MEDICINE; ERRORS;
D O I
10.1186/s12909-023-04457-0
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background Automated Item Generation (AIG) uses computer software to create multiple items from a single question model. There is currently a lack of data looking at whether item variants to a single question result in differences in student performance or human-derived standard setting. The purpose of this study was to use 50 Multiple Choice Questions (MCQs) as models to create four distinct tests which would be standard set and given to final year UK medical students, and then to compare the performance and standard setting data for each. Methods Pre-existing questions from the UK Medical Schools Council (MSC) Assessment Alliance item bank, created using traditional item writing techniques, were used to generate four ' isomorphic ' 50-item MCQ tests using AIG software. Isomorphic questions use the same question template with minor alterations to test the same learning outcome. All UK medical schools were invited to deliver one of the four papers as an online formative assessment for their final year students. Each test was standard set using a modified Angoff method. Thematic analysis was conducted for item variants with high and low levels of variance in facility (for student performance) and average scores (for standard setting). Results Two thousand two hundred eighteen students from 12 UK medical schools participated, with each school using one of the four papers. The average facility of the four papers ranged from 0.55-0.61, and the cut score ranged from 0.58-0.61. Twenty item models had a facility difference > 0.15 and 10 item models had a difference in standard setting of > 0.1. Variation in parameters that could alter clinical reasoning strategies had the greatest impact on item facility. Conclusions Item facility varied to a greater extent than the standard set. This difference may relate to variants causing greater disruption of clinical reasoning strategies in novice learners compared to experts, but is confounded by the possibility that the performance differences may be explained at school level and therefore warrants further study.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] EFFECT OF THE MEDIUM OF ITEM PRESENTATION ON EXAMINEE PERFORMANCE AND ITEM CHARACTERISTICS
    SPRAY, JA
    ACKERMAN, TA
    RECKASE, MD
    CARLSON, JE
    JOURNAL OF EDUCATIONAL MEASUREMENT, 1989, 26 (03) : 261 - 271
  • [32] Item selection strategy for reducing the number of items rated in an Angoff standard setting study
    Ferdous, Abdullah A.
    Plake, Barbara S.
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2007, 67 (02) : 193 - 206
  • [33] Teachers' ability to estimate item difficulty: A test of the assumptions in the Angoff standard setting method
    Impara, JC
    Plake, BS
    JOURNAL OF EDUCATIONAL MEASUREMENT, 1998, 35 (01) : 69 - 81
  • [34] Automated Test-Item Generation System for Retrieval Practice in Radiology Education
    Gunabushanam, Gowthaman
    Taylor, Caroline R.
    Mathur, Mahan
    Bokhari, Jamal
    Scoutt, Leslie M.
    ACADEMIC RADIOLOGY, 2019, 26 (06) : 851 - 859
  • [35] Relationship Between Assessment Item Format and Item Performance Characteristics
    Phipps, Stephen D.
    Brackbill, Marcia L.
    AMERICAN JOURNAL OF PHARMACEUTICAL EDUCATION, 2009, 73 (08)
  • [36] Innovations in Measuring Rater Accuracy in Standard Setting: Assessing Fit to Item Characteristic Curves
    Hurtz, Gregory M.
    Jones, J. Patrick
    APPLIED MEASUREMENT IN EDUCATION, 2009, 22 (02) : 120 - 143
  • [37] A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation
    Filipe Falcão
    Daniela Marques Pereira
    Nuno Gonçalves
    Andre De Champlain
    Patrício Costa
    José Miguel Pêgo
    Advances in Health Sciences Education, 2023, 28 : 1441 - 1465
  • [38] TRACING THE IMPACT OF ITEM-BY-ITEM INFORMATION ACCESSING ON UNCERTAINTY REDUCTION
    JACOBY, J
    JACCARD, JJ
    CURRIM, I
    KUSS, A
    ANSARI, A
    TROUTMAN, T
    JOURNAL OF CONSUMER RESEARCH, 1994, 21 (02) : 291 - 303
  • [39] A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation
    Falcao, Filipe
    Pereira, Daniela Marques
    Goncalves, Nuno
    De Champlain, Andre
    Costa, Patricio
    Pego, Jose Miguel
    ADVANCES IN HEALTH SCIENCES EDUCATION, 2023, 28 (05) : 1441 - 1465
  • [40] COMPARISON OF ITEM AGREEMENT: THE IMPACT OF ITEM ORDER, RESPONSE OPTIONS AND TREATMENT
    Fernandes, L. L.
    King-Kallimanis, B. L.
    Weinstock, C.
    Tang, S.
    Kluetz, P. G.
    VALUE IN HEALTH, 2019, 22 : S106 - S106