Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis

被引:20
|
作者
de Oliveira, Gisele Pinto [1 ]
de Souza Bierrenbach, Ana Luiza [2 ]
de Camargo Junior, Kenneth Rochel [3 ]
Coeli, Claudia Medina [4 ]
Pinheiro, Rejane Sobrino [4 ]
机构
[1] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Programa Posgrad Saude Colet, Rio De Janeiro, RJ, Brazil
[2] Hosp Sirio Libanes, Inst Ensino & Pesquisa, Sao Paulo, SP, Brazil
[3] Univ Estado Rio de Janeiro, Inst Med Social, Rio De Janeiro, RJ, Brazil
[4] Univ Fed Rio de Janeiro, Inst Estudos Saude Colet, Rio De Janeiro, RJ, Brazil
来源
REVISTA DE SAUDE PUBLICA | 2016年 / 50卷
关键词
Tuberculosis; epidemiology; Data Accuracy; Sensitivity and Specificity; Epidemiological Surveillance; statistics & numerical data;
D O I
10.1590/S1518-8787.2016050006327
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
OBJECTIVE: To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs. METHODS: The study analyzed all TB records from 2009 to 2011 in the state of Rio de Janeiro. A deterministic record linkage algorithm was developed using a set of 70 rules, based on the combination of fragments of the key variables with or without modification (Soundex or substring). Each rule was formed by three or more fragments. The probabilistic approach required a cutoff point for the score, above which the links would be automatically classified as belonging to the same individual. The cutoff point was obtained by linkage of the Notifiable Diseases Information System - Tuberculosis database with itself, subsequent manual review and ROC curves and precision-recall. Sensitivity and specificity for accurate analysis were calculated. RESULTS: Accuracy ranged from 87.2% to 95.2% for sensitivity and 99.8% to 99.9% for specificity for probabilistic and deterministic record linkage, respectively. The occurrence of missing values for the key variables and the low percentage of similarity measure for name and date of birth were mainly responsible for the failure to identify records of the same individual with the techniques used. CONCLUSIONS: The two techniques showed a high level of correlation for pair classification. Although deterministic linkage identified more duplicate records than probabilistic linkage, the latter retrieved records not identified by the former. User need and experience should be considered when choosing the best technique to be used.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Transnational Record Linkage for Tuberculosis Surveillance and Program Evaluation
    Aiona, Kaylynn
    Lowenthal, Phillip
    Painter, John A.
    Reves, Randall
    Flood, Jennifer
    Parker, Matthew
    Fu, Yunxin
    Wall, Kirsten
    Walter, Nicholas D.
    PUBLIC HEALTH REPORTS, 2015, 130 (05) : 475 - 484
  • [32] RECORD LINKAGE STRATEGIES .2. PORTABLE SOFTWARE AND DETERMINISTIC MATCHING
    WAJDA, A
    ROOS, LL
    LAYEFSKY, M
    SINGLETON, JA
    METHODS OF INFORMATION IN MEDICINE, 1991, 30 (03) : 210 - 214
  • [33] ANATOMICAL ACCURACY AND FEASIBILITY OF PROBABILISTIC AND DETERMINISTIC TRACTOGRAPHY OF THE OPTIC RADIATION
    Nilsson, D. T.
    Rydenhag, B.
    Malmgren, K.
    Starck, G.
    Ljungberg, M.
    EPILEPSIA, 2010, 51 : 91 - 91
  • [35] Accuracy of a probabilistic record linkage strategy applied to identify deaths among cases reported to the Brazilian AIDS surveillance database
    Pereira Fonseca, Maria Goretti
    Coeli, Claudia Medina
    de Araujo Lucena, Francisca de Fatima
    Veloso, Valdilea Goncalves
    Carvalho, Marilia Sa
    CADERNOS DE SAUDE PUBLICA, 2010, 26 (07): : 1431 - 1438
  • [36] Probabilistic record linkage and a method to calculate the positive predictive value
    Blakely, T
    Salmond, C
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2002, 31 (06) : 1246 - 1252
  • [37] OPENRECLINK A FREE AND OPEN SOURCE SOLUTION FOR PROBABILISTIC RECORD LINKAGE
    Camargo, K.
    Coeli, C.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2011, 173 : S108 - S108
  • [38] Large-scale Entity Extraction and Probabilistic Record Linkage
    Villanustre, Flavio
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2014, : 85 - 93
  • [39] Improving Probabilistic Record Linkage Using Statistical Prediction Models
    Moretti, Angelo
    Shlomo, Natalie
    INTERNATIONAL STATISTICAL REVIEW, 2023, 91 (03) : 368 - 394
  • [40] Probabilistic record linkage for the integrated surveillance of the road traffic accident
    Farchi, S. F.
    Chini, F. C.
    Fortini, M. F.
    Tuoto, T. T.
    Rossi, P. G. R. Giorgi
    Greco, V. G.
    Borgia, P. B.
    EUROPEAN JOURNAL OF EPIDEMIOLOGY, 2006, 21 : 55 - 55