Crowdsourcing Ground Truth for Medical Relation Extraction

被引:23
|
作者
Dumitrache, Anca [1 ,2 ]
Aroyo, Lora [1 ]
Welty, Chris [3 ,4 ]
机构
[1] Vrije Univ Amsterdam, De Boelelaan 1085, NL-1081 HV Amsterdam, Netherlands
[2] IBM Ctr Adv Studies Benelux, Armonk, NY 10504 USA
[3] Google Res, New York, NY USA
[4] Google, New York, NY USA
关键词
Ground truth; relation extraction; clinical natural language processing; natural language ambiguity; inter-annotator disagreement; crowdtruth; crowd truth; UMLS;
D O I
10.1145/3152889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] GROUND TRUTH
    COOK, K
    JOURNAL OF SOIL AND WATER CONSERVATION, 1983, 38 (05) : 413 - 414
  • [32] Learning Latent Forests for Medical Relation Extraction
    Guo, Zhijiang
    Nan, Guoshun
    Lu, Wei
    Cohen, Shay B.
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3651 - 3657
  • [33] MGCN: MEDICAL RELATION EXTRACTION BASED ON GCN
    Wang, Yongpan
    Liu, Yong
    Zhang, Jianyi
    COMPUTING AND INFORMATICS, 2023, 42 (02) : 411 - 435
  • [34] Review of Relation Extraction in Electronic Medical Records
    Wang, Chen
    Li, Ming
    Ma, Jingang
    Computer Engineering and Applications, 2023, 59 (16) : 63 - 73
  • [35] Ground Truth
    Garrity, George M.
    STANDARDS IN GENOMIC SCIENCES, 2009, 1 (02): : 91 - 92
  • [36] Ground truth
    不详
    AVIATION WEEK & SPACE TECHNOLOGY, 2000, 153 (15): : 25 - 25
  • [37] Crowdsourcing Medical Research
    Riedl, John
    Riedl, Eric
    COMPUTER, 2013, 46 (01) : 89 - 92
  • [38] Crowdsourcing of Medical Data
    Thawrani, Vinita
    Londhe, Narendra D.
    Singh, Randeep
    IETE TECHNICAL REVIEW, 2014, 31 (03) : 249 - 253
  • [39] A Crowdsourcing Based Human-in-the-Loop Framework for Denoising UUs in Relation Extraction Tasks
    Li, Mengting
    Jin, Jian
    Wu, Wen
    Yang, Yan
    He, Liang
    Yang, Jing
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [40] Crowdsourcing medical education
    Blackwell, Katherine A.
    Travis, Michael J.
    Arbuckle, Melissa R.
    Ross, David A.
    MEDICAL EDUCATION, 2016, 50 (05) : 576 - 576