A Novel Imputation Approach for Sharing Protected Public Health Data

被引:5
|
作者
Erdman, Elizabeth A. [1 ]
Young, Leonard D. [2 ]
Bernson, Dana L. [1 ]
Bauer, Cici [4 ]
Chui, Kenneth [3 ]
Stopka, Thomas J. [5 ,6 ]
机构
[1] Commonwealth Massachusetts, Off Populat Hlth, Dept Publ Hlth, Boston, MA USA
[2] Commonwealth Massachusetts, Bur Hlth Profess Licensure, Dept Publ Hlth, Boston, MA USA
[3] Tufts Univ, Dept Publ Hlth & Community Med, Boston, MA USA
[4] Univ Texas Hlth Sci Ctr Houston, Dept Biostat & Data Sci, Houston, TX USA
[5] Tufts Univ, Tufts Clin & Translat Sci Inst, Medford, MA USA
[6] Tufts Univ, Dept Publ Hlth & Community Med, Medford, MA USA
关键词
MISSING DATA;
D O I
10.2105/AJPH.2021.306432
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objectives. To develop an imputation method to produce estimates for suppressed values within a shared government administrative data set to facilitate accurate data sharing and statistical and spatial analyses. Methods. We developed an imputation approach that incorporated known features of suppressed Massachusetts surveillance data from 2011 to 2017 to predict missing values more precisely. Our methods for 35 de-identified opioid prescription data sets combined modified previous or next substitution followed by mean imputation and a count adjustment to estimate suppressed values before sharing. We modeled 4 methods and compared the results to baseline mean imputation. Results. We assessed performance by comparing root mean squared error (RMSE), mean absolute error (MAE), and proportional variance between imputed and suppressed values. Our method outperformed mean imputation; we retained 46% of the suppressed value's proportional variance with better precision (22% lower RMSE and 26% lower MAE) than simple mean imputation. Conclusions. Our easy-to-implement imputation technique largely overcomes the adverse effects of low count value suppression with superior results to simple mean imputation. This novel method is generalizable to researchers sharing protected public health surveillance data.
引用
收藏
页码:1830 / 1838
页数:9
相关论文
共 50 条
  • [41] DATA SHARING AS A PUBLIC GOOD
    BARON, JN
    AMERICAN SOCIOLOGICAL REVIEW, 1988, 53 (01) : R6 - R8
  • [42] A Novel Missing Data Imputation Approach for Time Series Air Quality Data Based on Logistic Regression
    Chen, Mei
    Zhu, Hongyu
    Chen, Yongxu
    Wang, Youshuai
    ATMOSPHERE, 2022, 13 (07)
  • [43] Missing Data Imputation by LOLIMOT and FSVM/FSVR Algorithms with a Novel Approach: A Comparative Study
    Fazlikhani, Fatemeh
    Motakefi, Pegah
    Pedram, Mir Mohsen
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS: THEORY AND FOUNDATIONS, PT II, 2018, 854 : 551 - 569
  • [44] A Novel Index Measure Imputation Algorithm for Missing Data Values: A Machine Learning Approach
    Madhu, G.
    Rajinikanth, T. V.
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2012, : 81 - 87
  • [45] A Novel Spatiotemporal Data Low-Rank Imputation Approach for Traffic Sensor Network
    Chen, Xiaobo
    Liang, Shurong
    Zhang, Zhihao
    Zhao, Feng
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (20): : 20122 - 20135
  • [46] Proposal for a European Public Health Research Infrastructure for Sharing of health and Medical administrative data (PHRIMA)
    Burgun, Anita
    Oksen, Dina V.
    Kuchinke, Wolfgang
    Prokosch, Hans-Ulrich
    Ganslandt, Thomas
    Buchan, Iain
    van Staa, Tjeerd
    Cunningham, James
    Gjerstorff, Marianne L.
    Dufour, Jean-Charles
    Gibrat, Jean-Francois
    Nikolski, Macha
    Verger, Pierre
    Cambon-Thomsen, Anne
    Masella, Cristina
    Lettieri, Emanuele
    Bertele, Paolo
    Salokannel, Marjut
    Thiebaut, Rodolphe
    Persoz, Charles
    Chene, Genevieve
    Ohmann, Christian
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1005 - 1005
  • [47] BLOCKCHAIN-BASED ATHLETE HEALTH ARCHIVES DATA SHARING FROM THE PERSPECTIVE OF PUBLIC HEALTH
    Xiu, Chen
    REVISTA INTERNACIONAL DE MEDICINA Y CIENCIAS DE LA ACTIVIDAD FISICA Y DEL DEPORTE, 2024, 24 (95): : 336 - 352
  • [48] Data sharing considerations to maximize the use of pathogen biological and genomics resources data for public health
    Holden, Nicola J.
    JOURNAL OF APPLIED MICROBIOLOGY, 2024, 135 (09)
  • [49] Enabling Efficient and Protected Sharing of Data in Cloud Computing
    Aarthi, D.
    Indira, N.
    2016 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2016,
  • [50] A Linked Democracy Approach for Regulating Public Health Data
    Casanovas P.
    Mendelson D.
    Poblet M.
    Health and Technology, 2017, 7 (4) : 519 - 537