Metrics for Estimating Validity, Reliability and Bias in Peer Assessment

被引:0
|
作者
Molina-Carmona, Rafael [1 ]
Satorre-Cuerda, Rosana [1 ]
Compan-Rosique, Patricia [1 ]
Llorens-Largo, Faraon [1 ]
机构
[1] Univ Alicante, Catedra Santander UA Transformac Digital, Ctra San Vicente del Raspeig S-N, Alicante 03690, Spain
关键词
peer assessment; success rate; agreement degree; reliability; validity; bias; confusion matrix; automatic classification;
D O I
暂无
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Peer assessment is a widespread way of evaluating and rating the quality of a work in the field of education. Although it results to be a very effective learning instrument, it is subjected to possible problems of reliability, validity and some potential biases. Most works that study and try to solve these problems are focused on specific cases and the statistics for measuring reliability, validity or bias are global, that is, they give a measure of these values for the whole process, but they do not allow an individual study. In this work the approach is different. It proposes some metrics for reliability and validity of each reviewer, as well as an approximation to the possible biases that may appear in the assessment process, so that the review process can be itself assessed. An analogy between the work of a reviewer in a process of peer assessment and the operation of an automatic classifier is proposed. This has allowed us to leverage the usual measures in evaluating the quality of automatic classifiers to establish the quality of peer assessment. The reviewers are characterized by obtaining their confusion matrices and six new indicators: success rate (which estimates the validity); agreement degree (as a measure of reliability); assessment median and its interquartile range (for the estimation of central tendency and restriction of range biases); and average distance to diagonal and its standard deviation (to determine possible leniency and harshness biases). This method provides indicators of the reviewer's task and the detection of different profiles, so that the teacher can assess the work of the students as reviewers and introduce some correction mechanisms in the final assessment of the works. A practical example of application to an engineering degree is provided to illustrate the potential of the method.
引用
收藏
页码:968 / 980
页数:13
相关论文
共 50 条
  • [31] RELIABILITY AND VALIDITY OF THE ASSESSMENT OF ANTIDEPRESSANT EFFECTS
    MAIER, W
    PHILIPP, M
    BULLER, R
    SCHLEGEL, S
    PHARMACOPSYCHIATRY, 1988, 21 (06) : 449 - 450
  • [32] Validity and reliability of the Leisure Assessment Inventory
    Hawkins, BA
    Ardovino, P
    Hsieh, CM
    MENTAL RETARDATION, 1998, 36 (04): : 303 - 313
  • [33] Reliability and validity of the Relationship Assessment Scale
    Vaughn, MJ
    Baier, MEM
    AMERICAN JOURNAL OF FAMILY THERAPY, 1999, 27 (02): : 137 - 147
  • [34] VALIDITY AND RELIABILITY OF THE CONSTIPATION ASSESSMENT SCALE
    MCMILLAN, SC
    WILLIAMS, FA
    CANCER NURSING, 1989, 12 (03) : 183 - 188
  • [35] Validity and Reliability of the Reflux Sign Assessment
    Lechien, Jerome R.
    Ruiz, Alexandra Rodriguez
    Dequanter, Didier
    Bobin, Francois
    Mouawad, Francois
    Muls, Vinciane
    Huet, Kathy
    Harmegnies, Bernard
    Remacle, Sarah
    Finck, Camille
    Saussez, Sven
    ANNALS OF OTOLOGY RHINOLOGY AND LARYNGOLOGY, 2020, 129 (04): : 313 - 325
  • [36] VALIDITY AND RELIABILITY OF THE NEUROBEHAVIORAL ASSESSMENT SCALE
    CHERNIK, DA
    TUCKER, M
    GIGLI, B
    YOO, K
    PAUL, K
    LAINE, H
    SIEGEL, JL
    JOURNAL OF CLINICAL PSYCHOPHARMACOLOGY, 1992, 12 (01) : 43 - 48
  • [37] Validity, reliability and the assessment of engineering education
    Moskal, Barbara M.
    Leydens, Jon A.
    Pavelich, Michael J.
    2002, American Society for Engineering Education (91)
  • [38] Assessment of the validity and reliability of a diagnostic test
    Donis, Jose H.
    AVANCES EN BIOMEDICINA, 2012, 1 (02): : 73 - 81
  • [39] Reliability and validity of an instrument for the assessment of bradykinesia
    Mentzel, Thierry Q.
    Lieverse, Ritsaert
    Levens, Amar
    Mentzel, Charlotte L.
    Tenback, Diederik E.
    Bakker, P. Rob
    Daanen, Hein A. M.
    van Harten, Peter N.
    PSYCHIATRY RESEARCH, 2016, 238 : 189 - 195
  • [40] INSURING RELIABILITY AND VALIDITY IN COMPETENCY ASSESSMENT
    TIKUNOFF, WJ
    WARD, BA
    JOURNAL OF TEACHER EDUCATION, 1978, 29 (02) : 33 - 37