Variable selection in model-based discriminant analysis

被引:26
|
作者
Maugis, C. [1 ]
Celeux, G. [2 ]
Martin-Magniette, M-L [3 ,4 ]
机构
[1] Univ Toulouse, INSA Toulouse, Inst Math Toulouse, F-31077 Toulouse 4, France
[2] Inria Saclay Ile de France, Sophia Antipolis, France
[3] UMR AgroParisTech INRA MIA 518, Paris, France
[4] ERL CNRS 8196, UEVE, URGV UMR INRA 1165, Evry, France
关键词
Discriminant; redundant or independent variables; Variable selection; Gaussian classification models; Linear regression; BIC; CLASSIFICATION;
D O I
10.1016/j.jmva.2011.05.004
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by a previous work on variable selection in model-based clustering. A BIC-like model selection criterion is proposed. It is optimized through two embedded forward stepwise variable selection algorithms for classification and linear regression. The model identifiability and the consistency of the variable selection criterion are proved. Numerical experiments on simulated and real data sets illustrate the interest of this variable selection methodology. In particular, it is shown that this well ground variable selection model can be of great interest to improve the classification performance of the quadratic discriminant analysis in a high dimension context. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:1374 / 1387
页数:14
相关论文
共 50 条
  • [31] Variable selection for model-based high-dimensional clustering
    Wang, Sijian
    Zhu, Ji
    PREDICTION AND DISCOVERY, 2007, 443 : 177 - +
  • [32] Input variable selection for model-based production control and optimisation
    Glavan, Miha
    Gradisar, Dejan
    Atanasijevic-Kunc, Maja
    Strmcnik, Stanko
    Music, Gasper
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2013, 68 (9-12): : 2743 - 2759
  • [33] Robust variable selection for model-based learning in presence of adulteration
    Cappozzo, Andrea
    Greselin, Francesca
    Murphy, Thomas Brendan
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 158
  • [34] Input variable selection for model-based production control and optimisation
    Glavan, M. (miha.glavan@ijs.si), 1600, Springer London (68): : 9 - 12
  • [35] Model-based clustering, classification, and discriminant analysis of data with mixed type
    Browne, Ryan P.
    McNicholas, Paul D.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2012, 142 (11) : 2976 - 2984
  • [36] Pairwise Variable Selection for High-Dimensional Model-Based Clustering
    Guo, Jian
    Levina, Elizaveta
    Michailidis, George
    Zhu, Ji
    BIOMETRICS, 2010, 66 (03) : 793 - 804
  • [37] Variable selection in model-based clustering using multilocus genotype data
    Toussile W.
    Gassiat E.
    Advances in Data Analysis and Classification, 2009, 3 (2) : 109 - 134
  • [38] Partial Least Squares Discriminant Analysis Model Based on Variable Selection Applied to Identify the Adulterated Olive Oil
    Xinhui Li
    Sulan Wang
    Weimin Shi
    Qi Shen
    Food Analytical Methods, 2016, 9 : 1713 - 1718
  • [39] Variable Selection in PLS Discriminant Analysis via the Disco
    Simonetti, Biagio
    Lucadamo, Antonio
    Rodriguez, Maria R. G.
    CURRENT ANALYTICAL CHEMISTRY, 2012, 8 (02) : 266 - 272
  • [40] DALASS: Variable selection in discriminant analysis via the LASSO
    Trendafilov, Nickolay T.
    Jolliffe, Ian T.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (08) : 3718 - 3736