Automated Scoring of Constructed-Response Science Items: Prospects and Obstacles

被引:71
作者
Liu, Ou Lydia [1 ]
Brew, Chris [2 ]
Blackmore, John [1 ]
Gerard, Libby [3 ]
Madhok, Jacquie [3 ]
Linn, Marcia C. [4 ]
机构
[1] Educ Testing Serv, Princeton, NJ 08541 USA
[2] Nuance, Sunnyvale, CA 94085 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
[4] Univ Calif Berkeley, Grad Sch Educ, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
automated scoring; constructed-response items; c-rater (TM); science assessment; KNOWLEDGE INTEGRATION; WEIGHTED KAPPA; TECHNOLOGY; AGREEMENT; FEEDBACK; CHOICE;
D O I
10.1111/emip.12028
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Content-based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept-based scoring tool for content-based scoring, c-rater (TM), for four science items with rubrics aiming to differentiate among multiple levels of understanding. The items showed moderate to good agreement with human scores. The findings suggest that automated scoring has the potential to score constructed-response items with complex scoring rubrics, but in its current design cannot replace human raters. This article discusses sources of disagreement and factors that could potentially improve the accuracy of concept-based automated scoring.
引用
收藏
页码:19 / 28
页数:10
相关论文
共 51 条
[1]  
[Anonymous], 1993, Applied Measurement in Education, DOI DOI 10.1207/S15324818AME0602_1
[2]  
[Anonymous], 2012, FRAM K 12 SCI ED PRA
[3]  
[Anonymous], 1999, Interactive Multimedia Electronic Journal of Computer-Enhanced Learning
[4]  
Attali Y., 2008, GRE BOARD RESEARCH R
[5]  
Attali Y., 2008, ETS GRE BOARD RESEAR
[6]   A meta-analysis of the effects of feedback in computer-based instruction [J].
Azevedo, R ;
Bernard, RM .
JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 1995, 13 (02) :111-127
[7]   The accuracy of expert-system diagnoses of mathematical problem solutions [J].
Bennett, RE ;
Sebrechts, MM .
APPLIED MEASUREMENT IN EDUCATION, 1996, 9 (02) :133-150
[8]   Validating automated speaking tests [J].
Bernstein, Jared ;
Van Moere, Alistair ;
Cheng, Jian .
LANGUAGE TESTING, 2010, 27 (03) :355-377
[9]  
Burstein J, 2003, AUTOMATED ESSAY SCORING: A CROSS-DISCIPLINARY PERSPECTIVE, P209
[10]   Finding the WRITE stuff: Automatic identification of discourse structure in student essays [J].
Burstein, J ;
Marcu, D ;
Knight, K .
IEEE INTELLIGENT SYSTEMS, 2003, 18 (01) :32-39