Crowdsourcing Ground Truth for Medical Relation Extraction

被引:23
|
作者
Dumitrache, Anca [1 ,2 ]
Aroyo, Lora [1 ]
Welty, Chris [3 ,4 ]
机构
[1] Vrije Univ Amsterdam, De Boelelaan 1085, NL-1081 HV Amsterdam, Netherlands
[2] IBM Ctr Adv Studies Benelux, Armonk, NY 10504 USA
[3] Google Res, New York, NY USA
[4] Google, New York, NY USA
关键词
Ground truth; relation extraction; clinical natural language processing; natural language ambiguity; inter-annotator disagreement; crowdtruth; crowd truth; UMLS;
D O I
10.1145/3152889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Empirical methodology for crowdsourcing ground truth
    Dumitrache, Anca
    Inel, Oana
    Timmermans, Benjamin
    Ortiz, Carlos
    Sips, Robert-Jan
    Aroyo, Lora
    Welty, Chris
    SEMANTIC WEB, 2021, 12 (03) : 403 - 421
  • [2] WeLineation: Crowdsourcing delineations for reliable ground truth estimation
    Goel, Saksham
    Sharma, Yash
    Jauer, Malte-Levin
    Deserno, Thomas M.
    MEDICAL IMAGING 2020: IMAGING INFORMATICS FOR HEALTHCARE, RESEARCH, AND APPLICATIONS, 2020, 11318
  • [3] Integration of Crowdsourcing into Ontology Relation Extraction
    Kardinata, Eunike Andriani
    Rakhmawati, Nur Aini
    FIFTH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE, 2019, 161 : 826 - 833
  • [4] Ground-truth generation through crowdsourcing with probabilistic indexes
    Sánchez, Joan Andreu
    Vidal, Enrique
    Bosch, Vicente
    Quirós, Lorenzo
    Neural Computing and Applications, 2024, 36 (30) : 18879 - 18895
  • [5] Multi-Class Ground Truth Inference in Crowdsourcing with Clustering
    Zhang, Jing
    Sheng, Victor S.
    Wu, Jian
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (04) : 1080 - 1085
  • [6] Label Consistency-Based Ground Truth Inference for Crowdsourcing
    Li, Jiao
    Jiang, Liangxiao
    Zhang, Wenjun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [7] Collusion detection and ground truth inference in crowdsourcing for labeling tasks
    Song, Changyue
    Liu, Kaibo
    Zhang, Xi
    Journal of Machine Learning Research, 2021, 22
  • [8] Self-Crowdsourcing Training for Relation Extraction
    Abad, Azad
    Nabi, Moin
    Moschitti, Alessandro
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 518 - 523
  • [9] Crowdsourcing truth
    不详
    NEW SCIENTIST, 2016, 230 (3072) : 5 - 5
  • [10] Truth in Crowdsourcing
    Cox, Landon P.
    IEEE SECURITY & PRIVACY, 2011, 9 (05) : 74 - 76