Variable selection in model-based clustering and discriminant analysis with a regularization approach

被引:12
|
作者
Celeux, Gilles [1 ,2 ]
Maugis-Rabusseau, Cathy [3 ]
Sedki, Mohammed [4 ,5 ]
机构
[1] INRIA, Dept Math, Btiment 425, F-91405 Orsay, France
[2] Univ Paris Sud, Btiment 425, F-91405 Orsay, France
[3] Univ Toulouse, Inst Math Toulouse, UMR 5219, INSA Toulouse, 135 Ave Rangueil, F-31077 Toulouse 4, France
[4] Paris Sud Univ, Batiment 15-16,16 Ave Paul Vaillant Couturier, F-94807 Villejuif, France
[5] Hop Paul Brousse, INSERM, U1181, Batiment 15-16,16 Ave Paul Vaillant Couturier, F-94807 Villejuif, France
关键词
Variable selection; Lasso; Gaussian mixture; Clustering; Classification;
D O I
10.1007/s11634-018-0322-5
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Several methods for variable selection have been proposed in model-based clustering and classification. These make use of backward or forward procedures to define the roles of the variables. Unfortunately, such stepwise procedures are slow and the resulting algorithms inefficient when analyzing large data sets with many variables. In this paper, we propose an alternative regularization approach for variable selection in model-based clustering and classification. In our approach the variables are first ranked using a lasso-like procedure in order to avoid slow stepwise algorithms. Thus, the variable selection methodology of Maugis et al. (Comput Stat Data Anal 53:3872-3882, 2000b) can be efficiently applied to high-dimensional data sets.
引用
收藏
页码:259 / 278
页数:20
相关论文
共 50 条
  • [31] Variable selection for model-based clustering using the integrated complete-data likelihood
    Matthieu Marbac
    Mohammed Sedki
    Statistics and Computing, 2017, 27 : 1049 - 1063
  • [32] Variable Selection for Skewed Model-Based Clustering: Application to the Identification of Novel Sleep Phenotypes
    Wallace, Meredith L.
    Buysse, Daniel J.
    Germain, Anne
    Hall, Martica H.
    Iyengar, Satish
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (521) : 95 - 110
  • [33] Bayesian regularization for normal mixture estimation and model-based clustering
    Fraley, Chris
    Raftery, Adrian E.
    JOURNAL OF CLASSIFICATION, 2007, 24 (02) : 155 - 181
  • [34] A model-based approach to sequence clustering
    Binsztok, H
    Artières, T
    Gallinari, P
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 420 - 424
  • [35] Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering
    Chris Fraley
    Adrian E. Raftery
    Journal of Classification, 2007, 24 : 155 - 181
  • [36] Kernel Canonical Discriminant Analysis Based on Variable Selection
    Ikeda, Seiichi
    Sato, Yoshiharu
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2009, 13 (04) : 416 - 420
  • [37] Regularization and variable selection in Heckman selection model
    Emmanuel O. Ogundimu
    Statistical Papers, 2022, 63 : 421 - 439
  • [38] Regularization and variable selection in Heckman selection model
    Ogundimu, Emmanuel O.
    STATISTICAL PAPERS, 2022, 63 (02) : 421 - 439
  • [39] Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis
    Andrews, Jeffrey L.
    McNicholas, Paul D.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (04) : 1479 - 1486
  • [40] HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data
    Berge, Laurent
    Bouveyron, Charles
    Girard, Stephane
    JOURNAL OF STATISTICAL SOFTWARE, 2012, 46 (06): : 1 - 29