Crowd control: Effectively utilizing unscreened crowd workers for biomedical data annotation

被引:18
|
作者
Cocos, Anne [1 ,2 ]
Qian, Ting [1 ]
Callison-Burch, Chris [2 ]
Masino, Aaron J. [1 ]
机构
[1] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
[2] Univ Penn, Comp & Informat Sci Dept, Philadelphia, PA 19104 USA
关键词
Text annotations; Crowdsourcing; EHR data; Logistic regression; Sentence classification; CHALLENGES; TEXT;
D O I
10.1016/j.jbi.2017.04.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Annotating unstructured texts in Electronic Health Records data is usually a necessary step for conducting machine learning research on such datasets. Manual annotation by domain experts provides data of the best quality, but has become increasingly impractical given the rapid increase in the volume of EHR data. In this article, we examine the effectiveness of crowdsourcing with unscreened online workers as an alternative for transforming unstructured texts in EHRs into annotated data that are directly usable in supervised learning models. We find the crowdsourced annotation data to be just as effective as expert data in training a sentence classification model to detect the mentioning of abnormal ear anatomy in radiology reports of audiology. Furthermore, we have discovered that enabling workers to self-report a confidence level associated with each annotation can help researchers pinpoint less-accurate annotations requiring expert scrutiny. Our findings suggest that even crowd workers without specific domain knowledge can contribute effectively to the task of annotating unstructured EHR datasets. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:86 / 92
页数:7
相关论文
共 24 条
  • [1] ChatGPT outperforms crowd workers for text-annotation tasks
    Gilardi, Fabrizio
    Alizadeh, Meysam
    Kubli, Mael
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (30)
  • [2] On Data Annotation Efficiency for Image Based Crowd Counting
    Ma, Tianfang
    Liu, Shuoyan
    Wang, Qian
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [3] Trusts, co-ops, and crowd workers: Could we include crowd data workers as stakeholders in data trust design?
    Gomer, Richard C.
    Simperl, Elena
    DATA & POLICY, 2020, 2
  • [4] Towards Professional Level Crowd Annotation of Expert Domain Data
    Wang, Pei
    Vasconcelos, Nuno
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3166 - 3175
  • [5] In Search of Ambiguity: A Three-Stage Workflow Design to Clarify Annotation Guidelines for Crowd Workers
    Pradhan, Vivek Krishna
    Schaekermann, Mike
    Lease, Matthew
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [6] Crowd Evacuation Under Real Data: A Crowd Congestion Control Method Based on Sensors and Knowledge Graph
    Duan, Jihao
    Liu, Hong
    Gong, Weifeng
    Lyu, Lei
    IEEE SENSORS JOURNAL, 2023, 23 (08) : 8923 - 8931
  • [7] No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects
    Danchin, Antoine
    Ouzounis, Christos
    Tokuyasu, Taku
    Zucker, Jean-Daniel
    MICROBIAL BIOTECHNOLOGY, 2018, 11 (04): : 588 - 605
  • [8] A Time-Series-Based New Behavior Trace Model for Crowd Workers That Ensures Quality Annotation
    Al-Qershi, Fattoh
    Al-Qurishi, Muhammad
    Aksoy, Mehmet Sabih
    Faisal, Mohammed
    Algabri, Mohammed
    SENSORS, 2021, 21 (15)
  • [9] Focus Annotation of Task-based Data: A Comparison of Expert and Crowd-Sourced Annotation in a Reading Comprehension Corpus
    De Kuthy, Kordula
    Ziai, Ramon
    Meurers, Detmar
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3928 - 3935
  • [10] Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges
    Kyle Ellrott
    Alex Buchanan
    Allison Creason
    Michael Mason
    Thomas Schaffter
    Bruce Hoff
    James Eddy
    John M. Chilton
    Thomas Yu
    Joshua M. Stuart
    Julio Saez-Rodriguez
    Gustavo Stolovitzky
    Paul C. Boutros
    Justin Guinney
    Genome Biology, 20