A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology

被引:13
|
作者
Ong, Toan C. [1 ]
Duca, Lindsey M. [2 ]
Kahn, Michael G. [1 ]
Crume, Tessa L. [1 ]
机构
[1] Univ Colorado, Sch Med, Dept Pediat, Anschutz Med Campus,13611 East Colfax,Suite 210, Aurora, CO 80045 USA
[2] Univ Colorado, Colorado Sch Publ Hlth, Dept Epidemiol, Anschutz Med Campus, Aurora, CO 80045 USA
关键词
record linkage; data harmonization; patient matching; congenital heart disease; hybrid; LINKING; IMPLEMENTATION; IDENTIFIERS;
D O I
10.1093/jamia/ocz232
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: The disjointed healthcare system and the nonexistence of a universal patient identifier across systems necessitates accurate record linkage (RL). We aim to describe the implementation and evaluation of a hybrid record linkage method in a statewide surveillance system for congenital heart disease. Materials and Methods: Clear-text personally identifiable information on individuals in the Colorado Congenital Heart Disease surveillance system was obtained from 5 electronic health record and medical claims data sources. Two deterministic methods and 1 probabilistic RL method using first name, last name, social security number, date of birth, and house number were initially implemented independently and then sequentially in a hybrid approach to assess RL performance. Results: 16 480 nonunique individuals with congenital heart disease were ascertained. Deterministic linkage methods, when performed independently, yielded 4505 linked pairs (consisting of 2 records linked together within or across data sources). Probabilistic RL, using 3 initial characters of last name and gender for blocking, yielded 6294 linked pairs when executed independently. Using a hybrid linkage routine resulted in 6451 linkages and an additional 18%-24% correct linked pairs as compared to the independent methods. A hybrid linkage routine resulted in higher recall and F-measure scores compared to probabilistic and deterministic methods performed independently. Discussion: The hybrid approach resulted in increased linkage accuracy and identified pairs of linked record that would have otherwise been missed when using any independent linkage technique. Conclusion: When performing RL within and across disparate data sources, the hybrid RL routine outperformed independent deterministic and probabilistic methods.
引用
收藏
页码:505 / 513
页数:9
相关论文
共 50 条
  • [31] Record-linkage methodology for prescribing research
    Libby, G
    MacDonald, TM
    Evans, JMM
    JOURNAL OF CLINICAL PHARMACY AND THERAPEUTICS, 2001, 26 (04) : 241 - 246
  • [32] AN OPERATIONAL APPROACH TO RECORD LINKAGE
    MI, MP
    KAGAWA, JT
    EARLE, ME
    METHODS OF INFORMATION IN MEDICINE, 1983, 22 (02) : 77 - 82
  • [33] A scaling approach to record linkage
    Goldstein, Harvey
    Harron, Katie
    Cortina-Borja, Mario
    STATISTICS IN MEDICINE, 2017, 36 (16) : 2514 - 2521
  • [34] Probabilistic record linkage of anonymous cancer registry records
    Meyer, M
    Radespiel-Tröger, M
    Vogel, C
    INNOVATIONS IN CLASSIFICATION, DATA SCIENCE, AND INFORMATION SYSTEMS, 2005, : 599 - 604
  • [35] Supervised Negative Binomial Classifier for Probabilistic Record Linkage
    Kashyap, Harish
    Byadarhaly, Kiran
    INTELLIGENT COMPUTING, VOL 2, 2022, 507 : 727 - 738
  • [36] Accuracy of a probabilistic record-linkage methodology used to track blood donors in the Mortality Information System database
    Capuani, Ligia
    Bierrenbach, Ana Luiza
    Abreu, Fatima
    Takecian, Pedro Losco
    Ferreira, Joao Eduardo
    Sabino, Ester Cerdeira
    CADERNOS DE SAUDE PUBLICA, 2014, 30 (08): : 1623 - 1632
  • [37] Modelling of microstructure evolution during thermal processes - a hybrid deterministic-probabilistic approach
    Hadi, Iraj
    Jabbareh, Mohammad-Amin
    Nikbakht, Roghayeh
    Assadi, Hamid
    PHYSICAL AND NUMERICAL SIMULATION OF MATERIAL PROCESSING VI, PTS 1 AND 2, 2012, 704-705 : 63 - 70
  • [38] A Hybrid Intelligent Model for Deterministic and Quantile Regression Approach for Probabilistic Wind Power Forecasting
    Ul Haque, Ashraf
    Nehrir, M. Hashem
    Mandal, Paras
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2014, 29 (04) : 1663 - 1672
  • [39] Assessing record linkage between health care and Vital Statistics databases using deterministic methods
    Li, Bing
    Quan, Hude
    Fong, Andrew
    Lu, Mingshan
    BMC HEALTH SERVICES RESEARCH, 2006, 6 (1)
  • [40] Assessing record linkage between health care and Vital Statistics databases using deterministic methods
    Bing Li
    Hude Quan
    Andrew Fong
    Mingshan Lu
    BMC Health Services Research, 6