A Framework for Multiple Imputation in Cluster Analysis

被引:58
|
作者
Basagana, Xavier [1 ,2 ,3 ]
Barrera-Gomez, Jose [1 ,2 ,3 ]
Benet, Marta [1 ,2 ,3 ]
Anto, Josep M. [1 ,2 ,3 ,4 ]
Garcia-Aymerich, Judith [1 ,2 ,3 ,4 ]
机构
[1] Ctr Res Environm Epidemiol, Barcelona 08003, Catalonia, Spain
[2] Hosp del Mar, Res Inst, Barcelona, Spain
[3] CIBERESP, Barcelona, Spain
[4] Univ Pompeu Fabra, Fac Hlth & Life Sci, Dept Expt & Hlth Sci, Barcelona, Spain
关键词
classification; cluster analysis; imputation; missing data; FULLY CONDITIONAL SPECIFICATION;
D O I
10.1093/aje/kws289
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Multiple imputation is a common technique for dealing with missing values and is mostly applied in regression settings. Its application in cluster analysis problems, where the main objective is to classify individuals into homogenous groups, involves several difficulties which are not well characterized in the current literature. In this paper, we propose a framework for applying multiple imputation to cluster analysis when the original data contain missing values. The proposed framework incorporates the selection of the final number of clusters and a variable reduction procedure, which may be needed in data sets where the ratio of the number of persons to the number of variables is small. We suggest some ways to report how the uncertainty due to multiple imputation of missing data affects the cluster analysis outcomes namely the final number of clusters, the results of a variable selection procedure (if applied), and the assignment of individuals to clusters. The proposed framework is illustrated with data from the Phenotype and Course of Chronic Obstructive Pulmonary Disease (PAC-COPD) Study (Spain, 2004-2008), which aimed to classify patients with chronic obstructive pulmonary disease into different disease subtypes.
引用
收藏
页码:718 / 725
页数:8
相关论文
共 50 条
  • [1] Multiple Imputation for Robust Cluster Analysis to Address Missingness in Medical Data
    Harder, Arnold A.
    Olbricht, Gayla R.
    Ekuma, Godwin
    Hier, Daniel B.
    Obafemi-Ajayi, Tayo
    IEEE ACCESS, 2024, 12 : 42974 - 42991
  • [2] Investigating multiple imputation in cluster randomized trials
    Bailey, Brittney
    Andridge, Rebecca R.
    Shoben, Abigail B.
    TRIALS, 2017, 18
  • [3] SMIM: A unified framework of survival sensitivity analysis using multiple imputation and martingale
    Yang, Shu
    Zhang, Yilong
    Liu, Guanghan Frank
    Guan, Qian
    BIOMETRICS, 2023, 79 (01) : 230 - 240
  • [4] A UNIFIED INFERENCE FRAMEWORK FOR MULTIPLE IMPUTATION USING MARTINGALES
    Guan, Qian
    Yang, Shu
    STATISTICA SINICA, 2024, 34 (03) : 1649 - 1673
  • [5] Multiple imputation methods for bivariate outcomes in cluster randomised trials
    DiazOrdaz, K.
    Kenward, M. G.
    Gomes, M.
    Grieve, R.
    STATISTICS IN MEDICINE, 2016, 35 (20) : 3482 - 3496
  • [6] Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
    Valentin Voillet
    Philippe Besse
    Laurence Liaubet
    Magali San Cristobal
    Ignacio González
    BMC Bioinformatics, 17
  • [7] Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework
    Voillet, Valentin
    Besse, Philippe
    Liaubet, Laurence
    San Cristobal, Magali
    Gonzalez, Ignacio
    BMC BIOINFORMATICS, 2016, 17
  • [8] Multiple imputation in principal component analysis
    Julie Josse
    Jérôme Pagès
    François Husson
    Advances in Data Analysis and Classification, 2011, 5 : 231 - 246
  • [9] Multiple imputation in principal component analysis
    Josse, Julie
    Pages, Jerome
    Husson, Francois
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2011, 5 (03) : 231 - 246
  • [10] Multiple imputation when records used for imputation are not used or disseminated for analysis
    Reiter, Jerome P.
    BIOMETRIKA, 2008, 95 (04) : 933 - 946