Crowdsourcing Ground Truth for Medical Relation Extraction

被引:23
|
作者
Dumitrache, Anca [1 ,2 ]
Aroyo, Lora [1 ]
Welty, Chris [3 ,4 ]
机构
[1] Vrije Univ Amsterdam, De Boelelaan 1085, NL-1081 HV Amsterdam, Netherlands
[2] IBM Ctr Adv Studies Benelux, Armonk, NY 10504 USA
[3] Google Res, New York, NY USA
[4] Google, New York, NY USA
关键词
Ground truth; relation extraction; clinical natural language processing; natural language ambiguity; inter-annotator disagreement; crowdtruth; crowd truth; UMLS;
D O I
10.1145/3152889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cognitive computing systems require human labeled data for evaluation and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, which reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, which account for ambiguity in both human and machine performance on this task.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks
    Cong Yang
    Bipin Indurkhya
    John See
    Bo Gao
    Yan Ke
    Zeyd Boukhers
    Zhenyu Yang
    Marcin Grzegorzek
    International Journal of Computer Vision, 2024, 132 : 1219 - 1241
  • [22] Medical Relation Extraction with Manifold Models
    Wang, Chang
    Fan, James
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 828 - 838
  • [23] Measuring and Evaluating Ground Truth for Boundary Detection in Medical Images
    Cheng, Irene
    Flores-Mir, Carlos
    Major, Paul
    Basu, Anup
    2008 30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-8, 2008, : 5889 - 5892
  • [24] Efficient Crowdsourcing-Aided Positioning and Ground-Truth-Aided Truth Discovery for Mobile Wireless Sensor Networks in Urban Fields
    Lu, Haozhen
    Gao, Xiaofeng
    Chen, Guihai
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (03) : 1652 - 1664
  • [25] Truth Inference in Crowdsourcing: Is the Problem Solved?
    Zheng, Yudian
    Li, Guoliang
    Li, Yuanbing
    Shan, Caihua
    Cheng, Reynold
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (05): : 541 - 552
  • [26] The truth about ground truth
    Openshaw, S.
    Transactions in GIS, 1997, 2 (01): : 7 - 24
  • [27] Ground Truth
    George M. Garrity
    Standards in Genomic Sciences, 2009, 1 : 91 - 92
  • [28] GROUND TRUTH
    APPENZELLER, T
    SCIENCES-NEW YORK, 1991, 31 (02): : 8 - 9
  • [29] GROUND TRUTH
    DeBellis, Jeff
    APPALACHIAN HERITAGE-A LITERARY QUARTERLY OF THE SOUTHERN APPALACHIANS, 2012, 40 (02): : 140 - 140
  • [30] Ground truth
    Aviat Week Space Technol (New York), 2006, 10 (45-46):