An evaluation of the bootstrap for model validation in mixture models

被引:14
|
作者
Jaki, Thomas [1 ]
Su, Ting-Li [2 ]
Kim, Minjung [3 ]
Van Horn, M. Lee [4 ]
机构
[1] Univ Lancaster, Dept Math & Stat, Lancaster LA1 4YF, England
[2] Univ Manchester, Div Dent, Manchester, Lancs, England
[3] Univ Alabama, Dept Psychol, Box 870348, Tuscaloosa, AL 35487 USA
[4] Univ New Mexico, Coll Educ, Albuquerque, NM 87131 USA
关键词
Finite mixture models; Leave-k-out cross-validation; Model validation; Nonparametric Bootstrap; Regression mixture models; FINITE MIXTURES; BAYESIAN-INFERENCE; COMPONENTS; NUMBER;
D O I
10.1080/03610918.2017.1303726
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Bootstrapping has been used as a diagnostic tool for validating model results for a wide array of statistical models. Here we evaluate the use of the non-parametric bootstrap for model validation in mixture models. We show that the bootstrap is problematic for validating the results of class enumeration and demonstrating the stability of parameter estimates in both finite mixture and regression mixture models. In only 44% of simulations did bootstrapping detect the correct number of classes in at least 90% of the bootstrap samples for a finite mixture model without any model violations. For regression mixture models and cases with violated model assumptions, the performance was even worse. Consequently, we cannot recommend the non-parametric bootstrap for validating mixture models.The cause of the problem is that when resampling is used influential individual observations have a high likelihood of being sampled many times. The presence of multiple replications of even moderately extreme observations is shown to lead to additional latent classes being extracted. To verify that these replications cause the problems we show that leave-k-out cross-validation where sub-samples taken without replacement does not suffer from the same problem.
引用
收藏
页码:1028 / 1038
页数:11
相关论文
共 50 条
  • [1] Bootstrap Validation of the Estimated Parameters in Mixture Models Used for Clustering
    Taushanov, Zhivko
    Berchtold, Andre
    JOURNAL OF THE SFDS, 2019, 160 (01): : 114 - 129
  • [2] A bootstrap procedure for mixture models
    Winsberg, S
    deSoete, G
    DATA ANALYSIS, CLASSIFICATION, AND RELATED METHODS, 2000, : 59 - 62
  • [3] Model selection with bootstrap validation
    Savvides, Rafael
    Makela, Jarmo
    Puolamaki, Kai
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (02) : 162 - 186
  • [4] Using bootstrap likelihood ratios in finite mixture models
    Feng, ZD
    McCulloch, CE
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1996, 58 (03): : 609 - 617
  • [5] A bootstrap procedure for mixture models: applied to multidimensional scaling latent class models
    Winsberg, S
    De Soete, G
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2002, 18 (04) : 391 - 406
  • [6] RESIDUAL BOOTSTRAP METHOD: AN APPROACH TO MODEL VALIDATION
    Olatayo, T. O.
    Oredein, A. I.
    Akomolafe, A. A.
    ADVANCES AND APPLICATIONS IN STATISTICS, 2011, 25 (02) : 109 - 113
  • [7] Mixture models: Validation and parameter estimation
    Oomens, CW
    Huyghe, JM
    Janssen, JD
    COMPUTER METHODS IN BIOMECHANICS & BIOMEDICAL ENGINEERING - 2, 1998, : 511 - 518
  • [8] Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models
    Lopez-Cheda, Ana
    Cao, Ricardo
    Amalia Jacome, M.
    Van Keilegom, Ingrid
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 105 : 144 - 165
  • [9] Bootstrap internal validation command for predictive logistic regression models
    Fernandez-Felix, B. M.
    Garcia-Esquinas, E.
    Muriel, A.
    Royuela, A.
    Zamora, J.
    STATA JOURNAL, 2021, 21 (02): : 498 - 509
  • [10] Validation of trace-driven simulation models: Bootstrap tests
    Kleijnen, JPC
    Cheng, RCH
    Bettonvil, B
    MANAGEMENT SCIENCE, 2001, 47 (11) : 1533 - 1538