A Novel Imputation Approach for Sharing Protected Public Health Data

被引:5
|
作者
Erdman, Elizabeth A. [1 ]
Young, Leonard D. [2 ]
Bernson, Dana L. [1 ]
Bauer, Cici [4 ]
Chui, Kenneth [3 ]
Stopka, Thomas J. [5 ,6 ]
机构
[1] Commonwealth Massachusetts, Off Populat Hlth, Dept Publ Hlth, Boston, MA USA
[2] Commonwealth Massachusetts, Bur Hlth Profess Licensure, Dept Publ Hlth, Boston, MA USA
[3] Tufts Univ, Dept Publ Hlth & Community Med, Boston, MA USA
[4] Univ Texas Hlth Sci Ctr Houston, Dept Biostat & Data Sci, Houston, TX USA
[5] Tufts Univ, Tufts Clin & Translat Sci Inst, Medford, MA USA
[6] Tufts Univ, Dept Publ Hlth & Community Med, Medford, MA USA
关键词
MISSING DATA;
D O I
10.2105/AJPH.2021.306432
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objectives. To develop an imputation method to produce estimates for suppressed values within a shared government administrative data set to facilitate accurate data sharing and statistical and spatial analyses. Methods. We developed an imputation approach that incorporated known features of suppressed Massachusetts surveillance data from 2011 to 2017 to predict missing values more precisely. Our methods for 35 de-identified opioid prescription data sets combined modified previous or next substitution followed by mean imputation and a count adjustment to estimate suppressed values before sharing. We modeled 4 methods and compared the results to baseline mean imputation. Results. We assessed performance by comparing root mean squared error (RMSE), mean absolute error (MAE), and proportional variance between imputed and suppressed values. Our method outperformed mean imputation; we retained 46% of the suppressed value's proportional variance with better precision (22% lower RMSE and 26% lower MAE) than simple mean imputation. Conclusions. Our easy-to-implement imputation technique largely overcomes the adverse effects of low count value suppression with superior results to simple mean imputation. This novel method is generalizable to researchers sharing protected public health surveillance data.
引用
收藏
页码:1830 / 1838
页数:9
相关论文
共 50 条
  • [21] Sharing Research Data to Improve Public Health: A Funder Perspective
    Carr, David
    Littler, Katherine
    JOURNAL OF EMPIRICAL RESEARCH ON HUMAN RESEARCH ETHICS, 2015, 10 (03) : 314 - 316
  • [22] Make Data Sharing Routine to Prepare for Public Health Emergencies
    Chretien, Jean-Paul
    Rivers, Caitlin M.
    Johansson, Michael A.
    PLOS MEDICINE, 2016, 13 (08):
  • [23] Public health data collection and sharing using HIPAA messages
    Wu M.
    Zhao T.
    Wu C.
    Journal of Medical Systems, 2005, 29 (4) : 303 - 316
  • [24] IDENTIFYING THE PUBLIC'S PREFERENCES FOR SHARING HEALTH DATA DIGITALLY
    Johansson, Viberg J.
    Mascalzoni, D.
    Kaye, J.
    Shah, N.
    Jonsdottir, G. A.
    Haraldsdottir, E.
    Veldwijk, J.
    VALUE IN HEALTH, 2020, 23 : S681 - S681
  • [25] A Novel Approach for Collecting and Sharing Software Metrics Data
    Khomyakov, Ilya
    Sillitti, Alberto
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 2164 - 2167
  • [26] Data Sharing and Global Public Health: Defining What We Mean by Data
    Schwalbe, Nina
    Wahl, Brian
    Song, Jingyi
    Lehtimaki, Susanna
    FRONTIERS IN DIGITAL HEALTH, 2020, 2
  • [27] NOVEL IMPUTATION FOR TIME SERIES DATA
    Chang, Chia-Yang
    Wang, Cheng-Ru
    Lee, Shie-Jue
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOL. 2, 2015, : 916 - 920
  • [28] A new approach for data editing and imputation
    Sergio Delgado-Quintero
    Juan-José Salazar-González
    Mathematical Methods of Operations Research, 2008, 68
  • [29] A new approach for data editing and imputation
    Delgado-Quintero, Sergio
    Salazar-Gonzalez, Juan-Jose
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2008, 68 (03) : 407 - 428
  • [30] A Probabilistic Approach for Missing Data Imputation
    Arefin, Muhammed Nazmul
    Masum, Abdul Kadar Muhammad
    COMPLEXITY, 2024, 2024