Variable selection in model-based clustering and discriminant analysis with a regularization approach

被引:12
|
作者
Celeux, Gilles [1 ,2 ]
Maugis-Rabusseau, Cathy [3 ]
Sedki, Mohammed [4 ,5 ]
机构
[1] INRIA, Dept Math, Btiment 425, F-91405 Orsay, France
[2] Univ Paris Sud, Btiment 425, F-91405 Orsay, France
[3] Univ Toulouse, Inst Math Toulouse, UMR 5219, INSA Toulouse, 135 Ave Rangueil, F-31077 Toulouse 4, France
[4] Paris Sud Univ, Batiment 15-16,16 Ave Paul Vaillant Couturier, F-94807 Villejuif, France
[5] Hop Paul Brousse, INSERM, U1181, Batiment 15-16,16 Ave Paul Vaillant Couturier, F-94807 Villejuif, France
关键词
Variable selection; Lasso; Gaussian mixture; Clustering; Classification;
D O I
10.1007/s11634-018-0322-5
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Several methods for variable selection have been proposed in model-based clustering and classification. These make use of backward or forward procedures to define the roles of the variables. Unfortunately, such stepwise procedures are slow and the resulting algorithms inefficient when analyzing large data sets with many variables. In this paper, we propose an alternative regularization approach for variable selection in model-based clustering and classification. In our approach the variables are first ranked using a lasso-like procedure in order to avoid slow stepwise algorithms. Thus, the variable selection methodology of Maugis et al. (Comput Stat Data Anal 53:3872-3882, 2000b) can be efficiently applied to high-dimensional data sets.
引用
收藏
页码:259 / 278
页数:20
相关论文
共 50 条
  • [21] Variable selection in model-based clustering using multilocus genotype data
    Toussile W.
    Gassiat E.
    Advances in Data Analysis and Classification, 2009, 3 (2) : 109 - 134
  • [22] Automatic selection of ROIs using a model-based clustering approach
    Segovia, F.
    Gorriz, J. M.
    Ramirez, J.
    Salas-Gonzalez, D.
    Illan, I. A.
    Lopez, M.
    Chaves, R.
    Padilla, P.
    Puntonet, C. G.
    2009 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD, VOLS 1-5, 2009, : 3194 - +
  • [23] Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST
    Chris Fraley
    Adrian E. Raftery
    Journal of Classification, 2003, 20 : 263 - 286
  • [24] Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST
    Fraley, C
    Raftery, AE
    JOURNAL OF CLASSIFICATION, 2003, 20 (02) : 263 - 286
  • [25] Variable selection in discriminant analysis based on the location model for mixed variables
    Nor Idayu Mahat
    Wojtek Janusz Krzanowski
    Adolfo Hernandez
    Advances in Data Analysis and Classification, 2007, 1 : 105 - 122
  • [26] Variable selection in discriminant analysis based on the location model for mixed variables
    Mahat, Nor Idayu
    Krzanowski, Wojtek Janusz
    Hernandez, Adolfo
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2007, 1 (02) : 105 - 122
  • [27] clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R
    Scrucca, Luca
    Raftery, Adrian E.
    JOURNAL OF STATISTICAL SOFTWARE, 2018, 84 (01): : 1 - 28
  • [28] Nonlinear discriminant clustering based on spectral regularization
    Yubin Zhan
    Jianping Yin
    Xinwang Liu
    Neural Computing and Applications, 2013, 22 : 1599 - 1608
  • [29] Nonlinear discriminant clustering based on spectral regularization
    Zhan, Yubin
    Yin, Jianping
    Liu, Xinwang
    NEURAL COMPUTING & APPLICATIONS, 2013, 22 (7-8): : 1599 - 1608
  • [30] Variable selection for model-based clustering using the integrated complete-data likelihood
    Marbac, Matthieu
    Sedki, Mohammed
    STATISTICS AND COMPUTING, 2017, 27 (04) : 1049 - 1063