When to conduct probabilistic linkage vs. deterministic linkage? A simulation study

被引:67
|
作者
Zhu, Ying [1 ]
Matsuyama, Yutaka [1 ]
Ohashi, Yasuo [1 ,2 ]
Setoguchi, Soko [3 ,4 ]
机构
[1] Univ Tokyo, Grad Sch Med, Dept Biostat, Tokyo 1130033, Japan
[2] Chuo Univ, Dept Integrated Sci & Engn Sustainable Soc, Tokyo 112, Japan
[3] Duke Univ, Sch Med, Duke Clin Res Inst, Durham, NC USA
[4] Univ Tokyo, Grad Sch Med, Dept Pharmacoepidemiol, Tokyo 1130033, Japan
关键词
Record linkage; Probabilistic linkage; Deterministic linkage; Simulation study; Comparative validity; RECORD LINKAGE; HOSPITAL DISCHARGE; LINKING; HEALTH; REGISTRY; IDENTIFIERS; TRANSPARENT; ACCURACY; COHORT; CLAIMS;
D O I
10.1016/j.jbi.2015.05.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Introduction: When unique identifiers are unavailable, successful record linkage depends greatly on data quality and types of variables available. While probabilistic linkage theoretically captures more true matches than deterministic linkage by allowing imperfection in identifiers, studies have shown inconclusive results likely due to variations in data quality, implementation of linkage methodology and validation method. The simulation study aimed to understand data characteristics that affect the performance of probabilistic vs. deterministic linkage. Methods: We created ninety-six scenarios that represent real-life situations using non-unique identifiers. We systematically introduced a range of discriminative power, rate of missing and error, and file size to increase linkage patterns and difficulties. We assessed the performance difference of linkage methods using standard validity measures and computation time. Results: Across scenarios, deterministic linkage showed advantage in PPV while probabilistic linkage showed advantage in sensitivity. Probabilistic linkage uniformly outperformed deterministic linkage as the former generated linkages with better trade-off between sensitivity and PPV regardless of data quality. However, with low rate of missing and error in data, deterministic linkage performed not significantly worse. The implementation of deterministic linkage in SAS took less than 1 min, and probabilistic linkage took 2 min to 2 h depending on file size. Discussion: Our simulation study demonstrated that the intrinsic rate of missing and error of linkage variables was key to choosing between linkage methods. In general, probabilistic linkage was a better choice, but for exceptionally good quality data (<5% error), deterministic linkage was a more resource efficient choice. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:80 / 86
页数:7
相关论文
共 50 条
  • [41] Influence of Glycosidic Linkage on Solution Conformational Entropy of Oligosaccharides: Malto- vs. Isomalto- and Cello- vs. Laminarioligosaccharides
    Striegel, Andre M.
    Boone, Marcus A.
    BIOPOLYMERS, 2011, 95 (04) : 228 - 233
  • [42] Model-free vs. model-based linkage analysis: A false dichotomy?
    Hodge, SE
    AMERICAN JOURNAL OF MEDICAL GENETICS, 2001, 105 (01): : 62 - 64
  • [43] A comparative study of long term hydrologic forecasts: Deterministic vs probabilistic
    Braatz, DT
    Welles, E
    15TH CONFERENCE ON HYDROLOGY, 2000, : 267 - 270
  • [44] Expanded HIV Testing and Linkage to Care: Conventional vs. Point-of-Care Testing and Assignment of Patient Notification and Linkage to Care to an HIV Care Program
    Bares, Sara
    Eavou, Rebecca
    Bertozzi-Villa, Clara
    Taylor, Michelle
    Hyland, Heather
    Mcfadden, Rachel
    Shah, Sachin
    Pxo, Mai T.
    Walter, James
    Badlani, Sameer
    Schneider, John
    Prachand, Nik
    Benbow, Nanette
    Pitrak, David
    PUBLIC HEALTH REPORTS, 2016, 131 : 107 - 120
  • [45] Deterministic Record Linkage versus Similarity Functions: a Study in Health Databases from Brazil
    Firmino Suzuki, Katia Mitiko
    Porto Filho, Carlos Humberto
    Cozin, Luis Fernando
    Pereyra, Lucas Calabrez
    de Azevedo Marques, Paulo Mazzoncini
    MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 : 562 - 566
  • [46] Reliability Analysis Applied on Land Subsidence Effects of Groundwater Remediation: Probabilistic vs. Deterministic Approach
    Claudio Alimonti
    Mara Lombardi
    Monica Cardarilli
    Elena Soldo
    Water Resources Management, 2017, 31 : 1745 - 1758
  • [47] Towards Real-Time Detection of Symbolic Musical Patterns: Probabilistic vs. Deterministic Methods
    Silva, Nishal
    Fischione, Carlo
    Turchet, Luca
    PROCEEDINGS OF THE 2020 27TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2020, : 238 - 246
  • [48] Reliability Analysis Applied on Land Subsidence Effects of Groundwater Remediation: Probabilistic vs. Deterministic Approach
    Alimonti, Claudio
    Lombardi, Mara
    Cardarilli, Monica
    Soldo, Elena
    WATER RESOURCES MANAGEMENT, 2017, 31 (06) : 1745 - 1758
  • [49] Gene-dropping vs. empirical variance estimation for allele-sharing linkage statistics
    Jung, Jeesun
    Weeks, Daniel E.
    Feingold, Eleanor
    GENETIC EPIDEMIOLOGY, 2006, 30 (08) : 652 - 665
  • [50] A nonparametric bootstrap method for testing close linkage vs. pleiotrophy of coincident quantitative trait loci
    Lebreton, CH
    Visscher, PM
    Haley, CS
    Semikhodskii, A
    Quarrie, SA
    GENETICS, 1998, 150 (02) : 931 - 943