A Partial Least Squares based algorithm for parsimonious variable selection

被引:72
|
作者
Mehmood, Tahir [1 ]
Martens, Harald [2 ]
Saebo, Solve [1 ]
Warringer, Jonas [2 ,3 ]
Snipen, Lars [1 ]
机构
[1] Norwegian Univ Life Sci, Dept Chem Biotechnol & Food Sci, Trondheim, Norway
[2] Norwegian Univ Life Sci, Ctr Integrat Genet CIGENE Anim & Aquacultural Sci, Trondheim, Norway
[3] Univ Gothenburg, Dept Cell & Mol Biol, Gothenburg, Sweden
来源
关键词
NEAR-INFRARED SPECTROSCOPY; DIMENSIONAL GENOMIC DATA; SYNONYMOUS CODON USAGE; WAVELENGTH SELECTION; MULTIVARIATE CALIBRATION; BACTERIAL GENOME; PLS REGRESSION; ELIMINATION; LATENT; GENE;
D O I
10.1186/1748-7188-6-27
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems. Results: We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection. Conclusions: A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Simultaneous kinetic-spectrophotometric determination of sulfide and sulfite by partial least squares and genetic algorithm variable selection
    Ghasemi, J.
    Ebrahimi, D. M.
    Hejazi, L.
    Leardi, R.
    Niazi, A.
    JOURNAL OF ANALYTICAL CHEMISTRY, 2007, 62 (04) : 348 - 354
  • [22] Sparse partial least squares regression for simultaneous dimension reduction and variable selection
    Chun, Hyonho
    Keles, Suenduez
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2010, 72 : 3 - 25
  • [23] Model selection for partial least squares based dimension reduction
    Li, Guo-Zheng
    Zhao, Rui-Wei
    Qu, Hai-Ni
    You, Mingyu
    PATTERN RECOGNITION LETTERS, 2012, 33 (05) : 524 - 529
  • [24] Developing a Soft Sensor for an Air Separation Process Based on Variable Selection in Dynamic Partial Least Squares
    Liu, Jialin
    Chen, Ding-Sou
    24TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, PTS A AND B, 2014, 33 : 685 - 690
  • [25] Multispectral Dimension Reduction Algorithm Based on Partial Least Squares
    Yang Qiulan
    Wan Xiaoxia
    Xiao Gensheng
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (01)
  • [26] Incident detection algorithm based on partial least squares regression
    Wang, Wei
    Chen, Shuyan
    Qu, Gaofeng
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2008, 16 (01) : 54 - 70
  • [27] The successive projections algorithm for interval selection in partial least squares discriminant analysis
    de Sousa Fernandes, David Douglas
    Almeida, Valber Elias
    Pinto, Licarion
    Veras, Germano
    Harrop Galvao, Roberto Kawakami
    Gomes, Adriano Araujo
    Ugulino Araujo, Mario Cesar
    ANALYTICAL METHODS, 2016, 8 (41) : 7522 - 7530
  • [28] Variable selection using genetic algorithm for analysis of near-infrared spectral data using partial least squares
    Soh, Chit Siang
    Ong, Kok Meng
    Raveendran, P.
    2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 1178 - 1181
  • [29] Partial least squares variable selection method and its application in drug source analysis
    Zhu Er-Yi
    Lin Yan
    Zhuang Zan-Yong
    CHINESE JOURNAL OF ANALYTICAL CHEMISTRY, 2007, 35 (07) : 973 - 977
  • [30] Soft variable selection combining partial least squares and attention mechanism for multivariable calibration
    Xiong, Yinran
    Yang, Wuye
    Liao, Huiyun
    Gong, Zhenlin
    Xu, Zhenzhen
    Du, Yiping
    Li, Wei
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 223