Multiple Imputation for Robust Cluster Analysis to Address Missingness in Medical Data

被引：1

作者：

Harder, Arnold A. ^{[1
]}

Olbricht, Gayla R. ^{[1
,2
]}

Ekuma, Godwin ^{[3
]}

Hier, Daniel B. ^{[2
]}

Obafemi-Ajayi, Tayo ^{[2
,4
]}

机构：

[1] Missouri Univ Sci & Technol, Dept Math & Stat, Rolla, MO 65409 USA

[2] Missouri Univ Sci & Technol, Dept Elect & Comp Engn, Appl Computat Intelligence Lab, Rolla, MO 65409 USA

[3] Missouri State Univ, Dept Comp Sci, Springfield, MO 65897 USA

[4] Missouri State Univ, Engn Program, Springfield, MO 65897 USA

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Multiple data imputation; clustering; ensemble learning; canonical discriminant analysis; mixture models; traumatic brain injury; missingness; INFERENCE; MODELS; MICE;

D O I：

10.1109/ACCESS.2024.3377242

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cluster analysis has been applied to a wide range of problems as an exploratory tool to enhance knowledge discovery. Clustering aids disease subtyping, i.e. identifying homogeneous patient subgroups, in medical data. Missing data is a common problem in medical research and could bias clustering results if not properly handled. Yet, multiple imputation has been under-utilized to address missingness, when clustering medical data. Its limited integration in clustering of medical data, despite the known advantages and benefits of multiple imputation, could be attributed to many factors. This includes methodological complexity, difficulties in pooling results to obtain a consensus clustering, uncertainty regarding quality metrics, and a lack of accepted pipelines. A few studies have examined the feasibility of implementing multiple imputation for cluster analysis on simulated/small datasets. While these studies have begun to address how to pool imputed values and quantify uncertainty in clustering due to imputation, a need remains for a complete framework that integrates MI in the clustering of complex medical data and sophisticated cluster algorithms. We propose a cluster analysis framework that mitigates bias and addresses these limitations. It includes methods to pool multiple imputed datasets, create a consensus cluster solution by ensemble methods, and select an optimal number of clusters based on validity indices. It also estimates uncertainty about cluster membership attributable to the imputation and identifies features that characterize the derived clusters. The utility of this framework is illustrated by its application to a traumatic brain injury dataset with missing data. Our analysis revealed six multifaceted clusters that differed with respect to Glasgow Coma Score (GCS), mechanism of injury, sociodemographics, vitals, lab values, and radiological presentation. The most severe cluster consisted of single, relatively young patients injured by motor accident, with higher GCS severity scores. Comparative analysis with the miclust R package, along with statistical validation of cluster characterization, demonstrates its robust performance.

引用

页码：42974 / 42991

页数：18

共 50 条

[1] Multiple-Model Multiple Imputation for Longitudinal Count Data to Address Uncertainty in Missingness Mechanism
Farahani, E. Jalali
Baghfalaki, T.
APPLICATIONS AND APPLIED MATHEMATICS-AN INTERNATIONAL JOURNAL, 2018, 13 (01): : 84 - 96
[2] IMPUTATION OF MISSING DATA WITH DIFFERENT MISSINGNESS MECHANISM
Kang, Ho Ming
Yusof, Fadhilah
Mohamad, Ismail
JURNAL TEKNOLOGI, 2012, 57
[3] A Framework for Multiple Imputation in Cluster Analysis
Basagana, Xavier
Barrera-Gomez, Jose
Benet, Marta
Anto, Josep M.
Garcia-Aymerich, Judith
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2013, 177 (07) : 718 - 725
[4] A Conference (Missingness in Action) to Address Missingness in Data and AI in Health Care: Qualitative Thematic Analysis
Rose, Christian
Barber, Rachel
Preiksaitis, Carl
Kim, Ireh
Mishra, Nikesh
Kayser, Kristen
Brown, Italo
Gisondi, Michael
JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
[5] A multiple regression imputation method with application to sensitivity analysis under intermittent missingness
Uranga, Rolando
Molenberghs, Geert
Allende, Sira
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (15) : 5146 - 5161
[6] Imputation techniques for multivariate missingness in software measurement data
Khoshgoftaar, Taghi M.
Van Hulse, Jason
SOFTWARE QUALITY JOURNAL, 2008, 16 (04) : 563 - 600
[7] Imputation techniques for multivariate missingness in software measurement data
Taghi M. Khoshgoftaar
Jason Van Hulse
Software Quality Journal, 2008, 16 : 563 - 600
[8] The Performance of Multiple Imputation in Social Surveys with Missing Data from Planned Missingness and Item Nonresponse
Axenfeld, Julian B.
Bruch, Christian
Wolf, Christof
Blom, Annelies G.
SURVEY RESEARCH METHODS, 2024, 18 (02): : 137 - 151
[9] Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random
Curnow, Elinor
Cornish, Rosie P.
Heron, Jon E.
Carpenter, James R.
Tilling, Kate
BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
[10] A bayesian framework to address missing not at random data in longitudinal studies with multiple types of missingness
Mason, Alexina
Grieve, Richard
Gordon, Anthony C.
Russell, James A.
Walker, Simon
Paton, Nick
Carpenter, James
Gomes, Manuel
TRIALS, 2017, 18

← 1 2 3 4 5 →