A Partial Least Squares based algorithm for parsimonious variable selection

被引:72
|
作者
Mehmood, Tahir [1 ]
Martens, Harald [2 ]
Saebo, Solve [1 ]
Warringer, Jonas [2 ,3 ]
Snipen, Lars [1 ]
机构
[1] Norwegian Univ Life Sci, Dept Chem Biotechnol & Food Sci, Trondheim, Norway
[2] Norwegian Univ Life Sci, Ctr Integrat Genet CIGENE Anim & Aquacultural Sci, Trondheim, Norway
[3] Univ Gothenburg, Dept Cell & Mol Biol, Gothenburg, Sweden
来源
关键词
NEAR-INFRARED SPECTROSCOPY; DIMENSIONAL GENOMIC DATA; SYNONYMOUS CODON USAGE; WAVELENGTH SELECTION; MULTIVARIATE CALIBRATION; BACTERIAL GENOME; PLS REGRESSION; ELIMINATION; LATENT; GENE;
D O I
10.1186/1748-7188-6-27
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems. Results: We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection. Conclusions: A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Variable selection in partial least squares with the weighted variable contribution to the first singular value of the covariance matrix
    Lin, Weilu
    Hang, Haifeng
    Zhuang, Yingping
    Zhang, Siliang
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 183 : 113 - 121
  • [32] Comparison of Different Variable Selection Methods for Partial Least Squares Soft Sensor Development
    Wang, Zi Xiu
    He, Qinghua
    Wang, Jin
    2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 3116 - 3121
  • [33] Method of wavelength selection for partial least squares
    Osborne, SD
    Jordan, RB
    Künnemeyer, R
    ANALYST, 1997, 122 (12) : 1531 - 1537
  • [34] Boosting the Performance of Genetic Algorithms for Variable Selection in Partial Least Squares Spectral Calibrations
    Lavine, Barry K.
    White, Collin G.
    APPLIED SPECTROSCOPY, 2017, 71 (09) : 2092 - 2101
  • [35] Model selection for partial least squares regression
    Li, BB
    Morris, J
    Martin, EB
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 64 (01) : 79 - 89
  • [36] Partial Least Squares Discriminant Analysis Model Based on Variable Selection Applied to Identify the Adulterated Olive Oil
    Xinhui Li
    Sulan Wang
    Weimin Shi
    Qi Shen
    Food Analytical Methods, 2016, 9 : 1713 - 1718
  • [37] Partial Least Squares Discriminant Analysis Model Based on Variable Selection Applied to Identify the Adulterated Olive Oil
    Li, Xinhui
    Wang, Sulan
    Shi, Weimin
    Shen, Qi
    FOOD ANALYTICAL METHODS, 2016, 9 (06) : 1713 - 1718
  • [38] Relief wrapper based Kernel Partial Least Squares subspace selection
    Zhang, Buqun
    Zheng, Shangzhi
    Bu, Hualong
    Xia, Jing
    2009 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 3, 2009, : 44 - 48
  • [39] Feature Selection Approach based on Mutual Information and Partial Least Squares
    Shi, Qiang
    Tang, Jian
    Zhao, Lijie
    MATERIALS RESEARCH AND APPLICATIONS, PTS 1-3, 2014, 875-877 : 2025 - +
  • [40] THE PARTIAL TOTAL LEAST-SQUARES ALGORITHM
    VANHUFFEL, S
    VANDEWALLE, J
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1988, 21 (03) : 333 - 341