A Partial Least Squares based algorithm for parsimonious variable selection

被引:72
|
作者
Mehmood, Tahir [1 ]
Martens, Harald [2 ]
Saebo, Solve [1 ]
Warringer, Jonas [2 ,3 ]
Snipen, Lars [1 ]
机构
[1] Norwegian Univ Life Sci, Dept Chem Biotechnol & Food Sci, Trondheim, Norway
[2] Norwegian Univ Life Sci, Ctr Integrat Genet CIGENE Anim & Aquacultural Sci, Trondheim, Norway
[3] Univ Gothenburg, Dept Cell & Mol Biol, Gothenburg, Sweden
来源
关键词
NEAR-INFRARED SPECTROSCOPY; DIMENSIONAL GENOMIC DATA; SYNONYMOUS CODON USAGE; WAVELENGTH SELECTION; MULTIVARIATE CALIBRATION; BACTERIAL GENOME; PLS REGRESSION; ELIMINATION; LATENT; GENE;
D O I
10.1186/1748-7188-6-27
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems. Results: We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection. Conclusions: A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] PARTIAL TOTAL LEAST-SQUARES ALGORITHM
    VANHUFFEL, S
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1990, 33 (01) : 113 - 121
  • [42] AN ONLINE NIPALS ALGORITHM FOR PARTIAL LEAST SQUARES
    Stott, Alexander E.
    Kanna, Sithan
    Mandic, Danilo P.
    Pike, William T.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4177 - 4181
  • [43] Robust recursive partial least squares algorithm
    College of Mechanical and Vehicle Engineering, Hunan Univ, Changsha, Hunan 410082, China
    不详
    Hunan Daxue Xuebao, 2009, 9 (42-46):
  • [44] Improved variable reduction in partial least squares modelling based on Predictive-Property-Ranked Variables and adaptation of partial least squares complexity
    Andries, Jan P. M.
    Vander Heyden, Yvan
    Buydens, Lutgarde M. C.
    ANALYTICA CHIMICA ACTA, 2011, 705 (1-2) : 292 - 305
  • [45] Analysis of partial least squares algorithm based on SBM-DEA
    Du, Jianqiang
    Hao, Zhulin
    Wang, Guolong
    Yu, Riyue
    Nie, Bin
    Xiong, Wangping
    Journal of Chemical and Pharmaceutical Research, 2014, 6 (07) : 718 - 724
  • [46] Deep partial least squares for instrumental variable regression
    Nareklishvili, Maria
    Polson, Nicholas
    Sokolov, Vadim
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2023, 39 (06) : 734 - 754
  • [47] A statistical method for massive data based on partial least squares algorithm
    Xu Y.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)
  • [48] An algorithm for outdoor illumination estimation based on partial least squares method
    Yang, Meiyan
    Wu, Zhihong
    Liu, Yanli
    Qin, Xueying
    Peng, Qunsheng
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2012, 24 (04): : 541 - 547
  • [49] A Novel Kernel Matrix Isomap Algorithm Based on Partial Least Squares
    Li, Bing
    Guo, Feng Ming
    He, Yi Gang
    ADVANCED RESEARCH ON ENGINEERING MATERIALS, ENERGY, MANAGEMENT AND CONTROL, PTS 1 AND 2, 2012, 424-425 : 577 - +
  • [50] Least Squares Twin SVM Based On Partial Binary Tree Algorithm
    Yu, Qing
    Liu, Rui
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,