Crowd control: Effectively utilizing unscreened crowd workers for biomedical data annotation

被引:18
|
作者
Cocos, Anne [1 ,2 ]
Qian, Ting [1 ]
Callison-Burch, Chris [2 ]
Masino, Aaron J. [1 ]
机构
[1] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
[2] Univ Penn, Comp & Informat Sci Dept, Philadelphia, PA 19104 USA
关键词
Text annotations; Crowdsourcing; EHR data; Logistic regression; Sentence classification; CHALLENGES; TEXT;
D O I
10.1016/j.jbi.2017.04.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Annotating unstructured texts in Electronic Health Records data is usually a necessary step for conducting machine learning research on such datasets. Manual annotation by domain experts provides data of the best quality, but has become increasingly impractical given the rapid increase in the volume of EHR data. In this article, we examine the effectiveness of crowdsourcing with unscreened online workers as an alternative for transforming unstructured texts in EHRs into annotated data that are directly usable in supervised learning models. We find the crowdsourced annotation data to be just as effective as expert data in training a sentence classification model to detect the mentioning of abnormal ear anatomy in radiology reports of audiology. Furthermore, we have discovered that enabling workers to self-report a confidence level associated with each annotation can help researchers pinpoint less-accurate annotations requiring expert scrutiny. Our findings suggest that even crowd workers without specific domain knowledge can contribute effectively to the task of annotating unstructured EHR datasets. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:86 / 92
页数:7
相关论文
共 24 条
  • [21] Preparedness for Mass Gatherings: A Simulation-Based Framework for Flow Control and Management Using Crowd Monitoring Data
    Al-Ahmadi, Hassan M.
    Reza, Imran
    Jamal, Arshad
    Alhalabi, Wael S.
    Assi, Khaled J.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (05) : 4985 - 4997
  • [22] "Come as you are": Reconsidering the need for complex quality control when gridding crowd-sourced weather data
    de Baar, Jouke H. S.
    van Der Schrier, Gerard
    QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2025, 151 (766)
  • [23] Preparedness for Mass Gatherings: A Simulation-Based Framework for Flow Control and Management Using Crowd Monitoring Data
    Hassan M. Al-Ahmadi
    Imran Reza
    Arshad Jamal
    Wael S. Alhalabi
    Khaled J. Assi
    Arabian Journal for Science and Engineering, 2021, 46 : 4985 - 4997
  • [24] Adaptive traffic signal control for developing countries using fused parameters derived from crowd-source data
    Mishra, Sumit
    Singh, Vishal
    Gupta, Ankit
    Bhattacharya, Devanjan
    Mudgal, Abhisek
    TRANSPORTATION LETTERS-THE INTERNATIONAL JOURNAL OF TRANSPORTATION RESEARCH, 2023, 15 (04): : 296 - 307