Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap

被引:0
|
作者
Adrian O’Hagan
Thomas Brendan Murphy
Luca Scrucca
Isobel Claire Gormley
机构
[1] University College Dublin,School of Mathematics and Statistics and Insight: Centre for Data Analytics
[2] Università degli Studi di Perugia,Department of Economics
来源
Computational Statistics | 2019年 / 34卷
关键词
mclust; MclustBootstrap; Precision; Standard errors; Variance estimation;
D O I
暂无
中图分类号
学科分类号
摘要
Mixture models with (multivariate) Gaussian components are a popular tool in model-based clustering. Such models are often fitted by a procedure that maximizes the likelihood, such as the EM algorithm. At convergence, the maximum likelihood parameter estimates are typically reported, but in most cases little emphasis is placed on the variability associated with these estimates. In part this may be due to the fact that standard errors are not directly calculated in the model-fitting algorithm, either because they are not required to fit the model, or because they are difficult to compute. The examination of standard errors in model-based clustering is therefore typically neglected. Sampling based methods, such as the jackknife (JK), bootstrap (BS) and parametric bootstrap (PB), are intuitive, generalizable approaches to assessing parameter uncertainty in model-based clustering using a Gaussian mixture model. This paper provides a review and empirical comparison of the jackknife, bootstrap and parametric bootstrap methods for producing standard errors and confidence intervals for mixture parameters. The performance of such sampling methods in the presence of small and/or overlapping clusters requires consideration however; here the weighted likelihood bootstrap (WLBS) approach is demonstrated to be effective in addressing this concern in a model-based clustering framework. The JK, BS, PB and WLBS methods are illustrated and contrasted through simulation studies and through the traditional Old Faithful data set and also the Thyroid data set. The MclustBootstrap function, available in the most recent release of the popular R package mclust, facilitates the implementation of the JK, BS, PB and WLBS approaches to estimating parameter uncertainty in the context of model-based clustering. The JK, WLBS and PB approaches to variance estimation are shown to be robust and provide good coverage across a range of real and simulated data sets when performing model-based clustering; but care is advised when using the BS in such settings. In the case of poor model fit (for example for data with small and/or overlapping clusters), JK and BS are found to suffer from not being able to fit the specified model in many of the sub-samples formed. The PB also suffers when model fit is poor since it is reliant on data sets simulated from the model upon which to base the variance estimation calculations. However the WLBS will generally provide a robust solution, driven by the fact that all observations are represented with some weight in each of the sub-samples formed under this approach.
引用
收藏
页码:1779 / 1813
页数:34
相关论文
共 50 条
  • [41] Stratified multi-density spectral clustering using Gaussian mixture model
    Yue, Guanli
    Deng, Ansheng
    Qu, Yanpeng
    Cui, Hui
    Wang, Xueying
    INFORMATION SCIENCES, 2023, 633 : 182 - 203
  • [42] A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model
    Bruneau, Marine
    Mottet, Thierry
    Moulin, Serge
    Kerbiriou, Mael
    Chouly, Franz
    Chretien, Stephane
    Guyeux, Christophe
    COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 93 : 66 - 74
  • [43] Massive MIMO Codebook Design Using Gaussian Mixture Model Based Clustering
    Markkandan, S.
    Sivasubramanian, S.
    Mulerikkal, Jaison
    Shaik, Nazeer
    Jackson, Beulah
    Naryanan, Lakshmi
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (01): : 361 - 375
  • [44] Stratified Multi-Density Spectral Clustering Using Gaussian Mixture Model
    Yue, Guanli
    Deng, Ansheng
    Qu, Yanpeng
    Cui, Hui
    Wang, Xueying
    SSRN, 2023,
  • [45] Unsupervised Clustering of Quantitative Imaging Phenotypes Using Autoencoder and Gaussian Mixture Model
    Chen, Jianan
    Milot, Laurent
    Cheung, Helen M. C.
    Martel, Anne L.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 575 - 582
  • [46] Assessment of optimum confidence interval of reliability with three-parameter Weibull distribution using bootstrap weighted-norm method
    Xia, Xin-Tao
    Xu, Yong-Zhi
    Jin, Yin-Ping
    Shang, Yan-Tao
    Chen, Long
    Hangkong Dongli Xuebao/Journal of Aerospace Power, 2013, 28 (03): : 481 - 488
  • [47] UNSUPERVISED AUTOMATIC WHITE MATTER FIBER CLUSTERING USING A GAUSSIAN MIXTURE MODEL
    Liu, Meizhu
    Vemuri, Baba C.
    Deriche, Rachid
    2012 9TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2012, : 522 - 525
  • [48] Adaptive Gaussian Mixture Model for Uncertainty Propagation Using Virtual Sample Generation
    Xu, Tianlai
    Zhang, Zhe
    Han, Hongwei
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [49] A rival penalized EM algorithm towards maximizing weighted likelihood for density mixture clustering with automatic model selection
    Cheung, YM
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, 2004, : 633 - 636
  • [50] Analysis of parameter uncertainty in semi-distributed hydrological models using bootstrap method: A case study of SWAT model applied to Yingluoxia watershed in northwest China
    Li, Zhanling
    Shao, Quanxi
    Xu, Zongxue
    Cai, Xitian
    JOURNAL OF HYDROLOGY, 2010, 385 (1-4) : 76 - 83