A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology

被引:13
|
作者
Ong, Toan C. [1 ]
Duca, Lindsey M. [2 ]
Kahn, Michael G. [1 ]
Crume, Tessa L. [1 ]
机构
[1] Univ Colorado, Sch Med, Dept Pediat, Anschutz Med Campus,13611 East Colfax,Suite 210, Aurora, CO 80045 USA
[2] Univ Colorado, Colorado Sch Publ Hlth, Dept Epidemiol, Anschutz Med Campus, Aurora, CO 80045 USA
关键词
record linkage; data harmonization; patient matching; congenital heart disease; hybrid; LINKING; IMPLEMENTATION; IDENTIFIERS;
D O I
10.1093/jamia/ocz232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease. Materials and Methods: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance. Results: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently. Discussion: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique. Conclusion: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.
引用
收藏
页码:505 / 513
页数:9
相关论文
共 50 条
  • [21] Simultaneously probabilistic and deterministic approach (SPADA) for the materials design: methodology and experimental validation
    Zhanna Yermekova
    Anatoliy Mironenko
    Journal of Materials Science, 2019, 54 : 12381 - 12391
  • [22] Deterministic and Probabilistic Wind Power Forecasting Using a Hybrid Method
    Huang, Chao-Ming
    Huang, Yann-Chang
    Haung, Kun-Yang
    Chen, Shin-Ju
    Yang, Seng-Pei
    2017 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2017, : 400 - 405
  • [23] A study on the probabilistic record linkage and its application
    Choi, Yeonok
    Lee, Sangin
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (05) : 849 - 861
  • [24] A Probabilistic Record Linkage Model for Survival Data
    Hof, Michel H.
    Ravelli, Anita C.
    Zwinderman, Aeilko H.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (520) : 1504 - 1515
  • [25] Probabilistic Record Linkage for Disclosure Risk Assessment
    Shlomo, Natalie
    PRIVACY IN STATISTICAL DATABASES, PSD 2014, 2014, 8744 : 269 - 282
  • [26] When To Conduct Probabilistic Linkage vs. Deterministic Linkage? A Simulation Study
    Zhu, Ying
    Matsuyama, Yutaka
    Ohashi, Yasuo
    Setoguchi, Soko
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2014, 23 : 348 - 348
  • [27] When to conduct probabilistic linkage vs. deterministic linkage? A simulation study
    Zhu, Ying
    Matsuyama, Yutaka
    Ohashi, Yasuo
    Setoguchi, Soko
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 56 : 80 - 86
  • [28] Record linkage of claims and cancer registries data-Evaluation of a deterministic linkage approach based on indirect personal identifiers
    Kollhorst, Bianca
    Reinders, Tammo
    Grill, Susann
    Eberle, Andrea
    Intemann, Timm
    Kieschke, Joachim
    Meyer, Martin
    Nennecke, Alice
    Rathmann, Wolfgang
    Pigeot, Iris
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 (12) : 1287 - 1293
  • [29] Culling for Extreme-Scale Segmentation Volumes: A Hybrid Deterministic and Probabilistic Approach
    Beyer, Johanna
    Mohammed, Haneen
    Agus, Marco
    Al-Awami, Ali K.
    Pfister, Hanspeter
    Hadwiger, Markus
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (01) : 1132 - 1141
  • [30] An Introduction to Probabilistic Record Linkage with a Focus on Linkage Processing for WTC Registries
    Asher, Jana
    Resnick, Dean
    Brite, Jennifer
    Brackbill, Robert
    Cone, James
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (18) : 1 - 16