A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology

被引:13
|
作者
Ong, Toan C. [1 ]
Duca, Lindsey M. [2 ]
Kahn, Michael G. [1 ]
Crume, Tessa L. [1 ]
机构
[1] Univ Colorado, Sch Med, Dept Pediat, Anschutz Med Campus,13611 East Colfax,Suite 210, Aurora, CO 80045 USA
[2] Univ Colorado, Colorado Sch Publ Hlth, Dept Epidemiol, Anschutz Med Campus, Aurora, CO 80045 USA
关键词
record linkage; data harmonization; patient matching; congenital heart disease; hybrid; LINKING; IMPLEMENTATION; IDENTIFIERS;
D O I
10.1093/jamia/ocz232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease. Materials and Methods: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance. Results: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently. Discussion: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique. Conclusion: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.
引用
收藏
页码:505 / 513
页数:9
相关论文
共 50 条
  • [1] Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis
    de Oliveira, Gisele Pinto
    de Souza Bierrenbach, Ana Luiza
    de Camargo Junior, Kenneth Rochel
    Coeli, Claudia Medina
    Pinheiro, Rejane Sobrino
    REVISTA DE SAUDE PUBLICA, 2016, 50
  • [2] Estimating Precision and Recall for Deterministic and Probabilistic Record Linkage
    Chipperfield, James
    Hansen, Noel
    Rossiter, Peter
    INTERNATIONAL STATISTICAL REVIEW, 2018, 86 (02) : 219 - 236
  • [3] Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage
    Tromp, Miranda
    Ravelli, Anita C.
    Bonsel, Gouke J.
    Hasman, Arie
    Reitsma, Johannes B.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2011, 64 (05) : 565 - 572
  • [4] Deterministic and Probabilistic Record Linkage: an Application to Primary Care Data
    Carreras, Giulia
    Simonetti, Monica
    Cricelli, Claudio
    Lapi, Francesco
    JOURNAL OF MEDICAL SYSTEMS, 2018, 42 (05)
  • [5] Deterministic and Probabilistic Record Linkage: an Application to Primary Care Data
    Giulia Carreras
    Monica Simonetti
    Claudio Cricelli
    Francesco Lapi
    Journal of Medical Systems, 2018, 42
  • [6] Detecting Duplicates at Hospital Admission: Comparison of Deterministic and Probabilistic Record Linkage
    Waldenburger, Andreas
    Nasseh, Daniel
    Stausberg, Juergen
    UNIFYING THE APPLICATIONS AND FOUNDATIONS OF BIOMEDICAL AND HEALTH INFORMATICS, 2016, 226 : 135 - 138
  • [7] Exploring hybrid parallel systems for probabilistic record linkage
    Boratto, Murilo
    Alonso, Pedro
    Pinto, Clicia
    Melo, Pedro
    Barreto, Marcos
    Denaxas, Spiros
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (03): : 1137 - 1149
  • [8] Exploring hybrid parallel systems for probabilistic record linkage
    Murilo Boratto
    Pedro Alonso
    Clicia Pinto
    Pedro Melo
    Marcos Barreto
    Spiros Denaxas
    The Journal of Supercomputing, 2019, 75 : 1137 - 1149
  • [9] A hybrid approach to private record linkage
    Inan, Ali
    Kantarcioglu, Murat
    Bertino, Elisa
    Scannapieco, Monica
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 496 - +
  • [10] Linking Administrative and Electronic Medical Record Databases Using Hybrid Probabilistic and Deterministic Techniques
    Danielson, Erica
    Chang, Stella
    Huse, Daniel
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2010, 19 : S244 - S245