When to conduct probabilistic linkage vs. deterministic linkage? A simulation study

被引:67
|
作者
Zhu, Ying [1 ]
Matsuyama, Yutaka [1 ]
Ohashi, Yasuo [1 ,2 ]
Setoguchi, Soko [3 ,4 ]
机构
[1] Univ Tokyo, Grad Sch Med, Dept Biostat, Tokyo 1130033, Japan
[2] Chuo Univ, Dept Integrated Sci & Engn Sustainable Soc, Tokyo 112, Japan
[3] Duke Univ, Sch Med, Duke Clin Res Inst, Durham, NC USA
[4] Univ Tokyo, Grad Sch Med, Dept Pharmacoepidemiol, Tokyo 1130033, Japan
关键词
Record linkage; Probabilistic linkage; Deterministic linkage; Simulation study; Comparative validity; RECORD LINKAGE; HOSPITAL DISCHARGE; LINKING; HEALTH; REGISTRY; IDENTIFIERS; TRANSPARENT; ACCURACY; COHORT; CLAIMS;
D O I
10.1016/j.jbi.2015.05.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Introduction: When unique identifiers are unavailable, successful record linkage depends greatly on data quality and types of variables available. While probabilistic linkage theoretically captures more true matches than deterministic linkage by allowing imperfection in identifiers, studies have shown inconclusive results likely due to variations in data quality, implementation of linkage methodology and validation method. The simulation study aimed to understand data characteristics that affect the performance of probabilistic vs. deterministic linkage. Methods: We created ninety-six scenarios that represent real-life situations using non-unique identifiers. We systematically introduced a range of discriminative power, rate of missing and error, and file size to increase linkage patterns and difficulties. We assessed the performance difference of linkage methods using standard validity measures and computation time. Results: Across scenarios, deterministic linkage showed advantage in PPV while probabilistic linkage showed advantage in sensitivity. Probabilistic linkage uniformly outperformed deterministic linkage as the former generated linkages with better trade-off between sensitivity and PPV regardless of data quality. However, with low rate of missing and error in data, deterministic linkage performed not significantly worse. The implementation of deterministic linkage in SAS took less than 1 min, and probabilistic linkage took 2 min to 2 h depending on file size. Discussion: Our simulation study demonstrated that the intrinsic rate of missing and error of linkage variables was key to choosing between linkage methods. In general, probabilistic linkage was a better choice, but for exceptionally good quality data (<5% error), deterministic linkage was a more resource efficient choice. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:80 / 86
页数:7
相关论文
共 50 条
  • [21] Linkage studies of factor pairs for normal vs. glossy seedlings and flint vs. floury endosperm in maize
    Hayes, HK
    Drewbaker, HE
    AMERICAN NATURALIST, 1929, 63 : 229 - 238
  • [22] Foreign vs. domestic multinationals in R&D linkage strategies
    Cozza, Claudio
    Franco, Chiara
    Perani, Giulio
    Zanfei, Antonello
    INDUSTRY AND INNOVATION, 2021, 28 (06) : 725 - 748
  • [23] Linkage analysis in tetraploid species: a simulation study
    Hackett, CA
    Bradshaw, JE
    Meyer, RC
    McNicol, JW
    Milbourne, D
    Waugh, R
    GENETICAL RESEARCH, 1998, 71 (02) : 143 - 154
  • [24] Criteria to Choose Linkage Methods: A Simulation Study
    Zhu, Ying
    Matsuyama, Yutaka
    Ohashi, Yasuo
    Setoguchi, Soko
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2015, 24 : 195 - 196
  • [25] Haplotype evolution and linkage disequilibrium: A simulation study
    Calafell, F
    Grigorenko, EL
    Chikanian, AA
    Kidd, KK
    HUMAN HEREDITY, 2000, 51 (1-2) : 85 - 96
  • [26] PROBABILISTIC ANALYSIS AND MONTE-CARLO SIMULATION OF THE KINEMATIC ERROR IN A SPATIAL LINKAGE
    XU, WL
    ZHANG, QX
    MECHANISM AND MACHINE THEORY, 1989, 24 (01) : 19 - 27
  • [27] Post-colonoscopy colorectal cancers identified by probabilistic and deterministic linkage: results in an Australian prospective cohort
    Subramaniam, Kavitha
    Ang, P. W.
    Neeman, Teresa
    Fadia, Mitali
    Taupin, Doug
    BMJ OPEN, 2019, 9 (06):
  • [28] Linkage disequilibrium vs. pedigree: Genomic selection prediction accuracy in conifer species
    Thistlethwaite, Frances R.
    El-Dien, Omnia Gamal
    Ratcliffe, Blaise
    Klapste, Jaroslav
    Porth, Ilga
    Chen, Charles
    Stoehr, Michael U.
    Ingvarsson, Par K.
    El-Kassaby, Yousry A.
    PLOS ONE, 2020, 15 (06):
  • [29] Effects of Ether vs. Ester Linkage on Lipid Bilayer Structure and Water Permeability
    Guler, Deren
    Ghosh, Dipon
    Pan, Jianjun
    Nagle, John F.
    Mathai, John C.
    Zeidel, Mark L.
    Tristram-Nagle, Stephanie
    BIOPHYSICAL JOURNAL, 2009, 96 (03) : 461A - 461A
  • [30] Structure/function variants as candidate genes in asthma: linkage vs. association for relevance
    Rosenwasser, LJ
    CLINICAL AND EXPERIMENTAL ALLERGY, 1998, 28 : 90 - 92