Variable selection in model-based clustering and discriminant analysis with a regularization approach

被引:12
|
作者
Celeux, Gilles [1 ,2 ]
Maugis-Rabusseau, Cathy [3 ]
Sedki, Mohammed [4 ,5 ]
机构
[1] INRIA, Dept Math, Btiment 425, F-91405 Orsay, France
[2] Univ Paris Sud, Btiment 425, F-91405 Orsay, France
[3] Univ Toulouse, Inst Math Toulouse, UMR 5219, INSA Toulouse, 135 Ave Rangueil, F-31077 Toulouse 4, France
[4] Paris Sud Univ, Batiment 15-16,16 Ave Paul Vaillant Couturier, F-94807 Villejuif, France
[5] Hop Paul Brousse, INSERM, U1181, Batiment 15-16,16 Ave Paul Vaillant Couturier, F-94807 Villejuif, France
关键词
Variable selection; Lasso; Gaussian mixture; Clustering; Classification;
D O I
10.1007/s11634-018-0322-5
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Several methods for variable selection have been proposed in model-based clustering and classification. These make use of backward or forward procedures to define the roles of the variables. Unfortunately, such stepwise procedures are slow and the resulting algorithms inefficient when analyzing large data sets with many variables. In this paper, we propose an alternative regularization approach for variable selection in model-based clustering and classification. In our approach the variables are first ranked using a lasso-like procedure in order to avoid slow stepwise algorithms. Thus, the variable selection methodology of Maugis et al. (Comput Stat Data Anal 53:3872-3882, 2000b) can be efficiently applied to high-dimensional data sets.
引用
收藏
页码:259 / 278
页数:20
相关论文
共 50 条
  • [1] Variable selection in model-based clustering and discriminant analysis with a regularization approach
    Gilles Celeux
    Cathy Maugis-Rabusseau
    Mohammed Sedki
    Advances in Data Analysis and Classification, 2019, 13 : 259 - 278
  • [2] Variable selection in model-based discriminant analysis
    Maugis, C.
    Celeux, G.
    Martin-Magniette, M-L
    JOURNAL OF MULTIVARIATE ANALYSIS, 2011, 102 (10) : 1374 - 1387
  • [3] Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering
    Celeux, Gilles
    Martin-Magniette, Marie-Laure
    Maugis-Rabusseau, Cathy
    Raftery, Adrian E.
    JOURNAL OF THE SFDS, 2014, 155 (02): : 57 - 71
  • [4] Variable selection in penalized model-based clustering via regularization on grouped parameters
    Xie, Benhuai
    Pan, Wei
    Shen, Xiaotong
    BIOMETRICS, 2008, 64 (03) : 921 - 930
  • [5] Variable selection for model-based clustering
    Raftery, AE
    Dean, N
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 168 - 178
  • [6] A simple model-based approach to variable selection in classification and clustering
    Partovi Nia, Vahid
    Davison, Anthony C.
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2015, 43 (02): : 157 - 175
  • [7] On Model-Based Clustering, Classification, and Discriminant Analysis
    McNicholas, Paul D.
    JIRSS-JOURNAL OF THE IRANIAN STATISTICAL SOCIETY, 2011, 10 (02): : 181 - 199
  • [8] Variable selection methods for model-based clustering
    Fop, Michael
    Murphy, Thomas Brendan
    STATISTICS SURVEYS, 2018, 12 : 18 - 65
  • [9] SelvarClustMV: Variable selection approach in model-based clustering allowing for missing values
    Maugis-Rabusseau, Cathy
    Martin-Magniette, Marie-Laure
    Pelletier, Sandra
    JOURNAL OF THE SFDS, 2012, 153 (02): : 21 - 36
  • [10] Choosing models in model-based clustering and discriminant analysis
    Biernacki, C
    Govaert, G
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1999, 64 (01) : 49 - 71