Crowdsourcing Ground Truth for Medical Relation Extraction

被引:23
|
作者
Dumitrache, Anca [1 ,2 ]
Aroyo, Lora [1 ]
Welty, Chris [3 ,4 ]
机构
[1] Vrije Univ Amsterdam, De Boelelaan 1085, NL-1081 HV Amsterdam, Netherlands
[2] IBM Ctr Adv Studies Benelux, Armonk, NY 10504 USA
[3] Google Res, New York, NY USA
[4] Google, New York, NY USA
关键词
Ground truth; relation extraction; clinical natural language processing; natural language ambiguity; inter-annotator disagreement; crowdtruth; crowd truth; UMLS;
D O I
10.1145/3152889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Truth Discovery and Crowdsourcing Aggregation: A Unified Perspective
    Gao, Jing
    Li, Qi
    Zhao, Bo
    Fan, Wei
    Han, Jiawei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12): : 2049 - 2050
  • [42] Incentivizing the Workers for Truth Discovery in Crowdsourcing with Copiers
    Jiang, Lingyun
    Niu, Xiaofu
    Xu, Jia
    Yang, Dejun
    Xu, Lijie
    2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 1286 - 1295
  • [43] Gabor and Weber feature extraction performance based on Urban Atlas ground truth
    1600, Politechnica University of Bucharest (78):
  • [44] GABOR AND WEBER FEATURE EXTRACTION PERFORMANCE BASED ON URBAN ATLAS GROUND TRUTH
    Stan, Mihaela
    Popescu, Anca
    Stoichescu, Dan Alexandru
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2016, 78 (03): : 149 - 156
  • [45] Crowdsourcing Catatonia: Medical Crowdsourcing in Challenging Clinical Cases
    Sheehan, Kathleen
    Bahadur, Alexander G.
    Perdue, Jason C.
    JOURNAL OF THE ACADEMY OF CONSULTATION-LIAISON PSYCHIATRY, 2022, 63 : S149 - S150
  • [46] Generation of annotated multimodal ground truth datasets for abdominal medical image registration
    Dominik F. Bauer
    Tom Russ
    Barbara I. Waldkirch
    Christian Tönnes
    William P. Segars
    Lothar R. Schad
    Frank G. Zöllner
    Alena-Kathrin Golla
    International Journal of Computer Assisted Radiology and Surgery, 2021, 16 : 1277 - 1285
  • [47] Generation of annotated multimodal ground truth datasets for abdominal medical image registration
    Bauer, Dominik F.
    Russ, Tom
    Waldkirch, Barbara, I
    Toennes, Christian
    Segars, William P.
    Schad, Lothar R.
    Zoellner, Frank G.
    Golla, Alena-Kathrin
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (08) : 1277 - 1285
  • [48] Pixel Diffuser: Practical Interactive Medical Image Segmentation without Ground Truth
    Ju, Mingeon
    Yang, Jaewoo
    Lee, Jaeyoung
    Lee, Moonhyun
    Ji, Junyung
    Kim, Younghoon
    BIOENGINEERING-BASEL, 2023, 10 (11):
  • [49] Ground Truth Or Dare: Factors Affecting The Creation Of Medical Datasets For Training AI
    Zajac, Hubert D.
    Avlona, Natalia R.
    Andersen, Tariq O.
    Kensing, Finn
    Shklovski, Irina
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 351 - 362
  • [50] A hybrid approach toward biomedical relation extraction training corpora: combining distant supervision with crowdsourcing
    Sousa, Diana
    Lamurias, Andre
    Couto, Francisco M.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2020,