Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space

被引:49
|
作者
Luo, Shan [1 ]
Chen, Zehua [2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Math, Shanghai 200030, Peoples R China
[2] Natl Univ Singapore, Dept Stat & Appl Probabil, Singapore 117548, Singapore
关键词
Extended BIC; Oracle property; Selection consistency; Sparse high-dimensional linear models; NONCONCAVE PENALIZED LIKELIHOOD; ORTHOGONAL MATCHING PURSUIT; VARIABLE SELECTION; MODEL SELECTION; SIGNAL RECOVERY; ORACLE PROPERTIES; ADAPTIVE LASSO; LINEAR-MODELS; REGRESSION; SHRINKAGE;
D O I
10.1080/01621459.2013.877275
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article, we propose a method called sequential Lasso (SLasso) for feature selection in sparse high-dimensional linear models. The SLasso selects features by sequentially solving partially penalized least squares problems where the features selected in earlier steps are not penalized. The SLasso uses extended BIC (EBIC) as the stopping rule. The procedure stops when EBIC reaches a minimum. The asymptotic properties of SLasso are considered when the dimension of the feature space is ultra high and the number of relevant feature diverges. We show that, with probability converging to 1, the SLasso first selects all the relevant features before any irrelevant features can be selected, and that the EBIC decreases until it attains the minimum at the model consisting of exactly all the relevant features and then begins to increase. These results establish the selection consistency of SLasso. The SLasso estimators of the final model are ordinary least squares estimators. The selection consistency implies the oracle property of SLasso. The asymptotic distribution of the SLasso estimators with diverging number of relevant features is provided. The SLasso is compared with other methods by simulation studies, which demonstrates that SLasso is a desirable approach having an edge over the other methods. The SLasso together with the other methods are applied to a microarray data for mapping disease genes. Supplementary materials for this article are available online.
引用
收藏
页码:1229 / 1240
页数:12
相关论文
共 50 条
  • [31] Deep feature screening: Feature selection for ultra high-dimensional data via deep neural networks
    Li, Kexuan
    Wang, Fangfang
    Yang, Lingli
    Liu, Ruiqi
    NEUROCOMPUTING, 2023, 538
  • [32] Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data
    Yamada, Makoto
    Tang, Jiliang
    Lugo-Martinez, Jose
    Hodzic, Ermin
    Shrestha, Raunak
    Saha, Avishek
    Ouyang, Hua
    Yin, Dawei
    Mamitsuka, Hiroshi
    Sahinalp, Cenk
    Radivojac, Predrag
    Menczer, Filippo
    Chang, Yi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (07) : 1352 - 1365
  • [33] Model-free feature screening for ultra-high dimensional competing risks data
    Chen, Xiaolin
    Zhang, Yahui
    Liu, Yi
    Chen, Xiaojing
    STATISTICS & PROBABILITY LETTERS, 2020, 164
  • [34] Fully Bayesian logistic regression with hyper-LASSO priors for high-dimensional feature selection
    Li, Longhai
    Yao, Weixin
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (14) : 2827 - 2851
  • [35] Clustering-based Sequential Feature Selection Approach for High Dimensional Data Classification
    Alimoussa, M.
    Porebski, A.
    Vandenbroucke, N.
    Thami, R. Oulad Haj
    El Fkihi, S.
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 4: VISAPP, 2021, : 122 - 132
  • [36] A sequential feature selection procedure for high-dimensional Cox proportional hazards model
    Ke Yu
    Shan Luo
    Annals of the Institute of Statistical Mathematics, 2022, 74 : 1109 - 1142
  • [37] Feature Selection in High-Dimensional Space with Applications to Gene Expression Data
    Pantha, Nishan
    Ramasubramanian, Muthukumaran
    Gurung, Iksha
    Maskey, Manil
    Sanders, Lauren M.
    Casaletto, James
    Costes, Sylvain V.
    SOUTHEASTCON 2024, 2024, : 6 - 15
  • [38] ON THE ADVERSARIAL ROBUSTNESS OF FEATURE SELECTION USING LASSO
    Li, Fuwei
    Lai, Lifeng
    Cui, Shuguang
    PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [39] On the Adversarial Robustness of LASSO Based Feature Selection
    Li, Fuwei
    Lai, Lifeng
    Cui, Shuguang
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 5555 - 5567
  • [40] A sequential feature selection procedure for high-dimensional Cox proportional hazards model
    Yu, Ke
    Luo, Shan
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2022, 74 (06) : 1109 - 1142