A hybrid variable selection strategy based on continuous shrinkage of variable space in multivariate calibration

被引:121
|
作者
Yun, Yong-Huan [1 ,2 ]
Bin, Jun [3 ]
Liu, Dong-Li [1 ]
Xu, Lin [2 ]
Yan, Ting-Liang [2 ]
Cao, Dong-Sheng [4 ]
Xu, Qing-Song [5 ]
机构
[1] Hainan Univ, Coll Food Sci & Technol, Haikou 570228, Hainan, Peoples R China
[2] Chinese Acad Trop Agr Sci, Inst Environm & Plant Protect, Haikou 571101, Hainan, Peoples R China
[3] Guizhou Univ, Coll Tobacco Sci, Guiyang 550025, Guizhou, Peoples R China
[4] Cent South Univ, Xiangya Sch Pharmaceut Sci, Changsha 410013, Hunan, Peoples R China
[5] Cent South Univ, Sch Math & Stat, Changsha 410083, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Variable selection; Near-infrared spectroscopy; Multivariate calibration; Variable combination population analysis; Iteratively retains informative variables; Genetic algorithm; PARTIAL LEAST-SQUARES; WAVELENGTH INTERVAL SELECTION; GENETIC ALGORITHMS; POPULATION ANALYSIS; RANDOM FROG; REGRESSION; OPTIMIZATION; ELIMINATION; CHEMISTRY; SUBSET;
D O I
10.1016/j.aca.2019.01.022
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
When analyzing high-dimensional near-infrared (NIR) spectral datasets, variable selection is critical to improving models' predictive abilities. However, some methods have many limitations, such as a high risk of overfitting, time-intensiveness, or large computation demands, when dealing with a high number of variables. In this study, we propose a hybrid variable selection strategy based on the continuous shrinkage of variable space which is the core idea of variable combination population analysis (VCPA). The VCPA-based hybrid strategy continuously shrinks the variable space from big to small and optimizes it based on modified VCPA in the first step. It then employs iteratively retaining informative variables (IRIV) and a genetic algorithm (GA) to carry out further optimization in the second step. It takes full advantage of VCPA, GA, and IRIV, and makes up for their drawbacks in the face of high numbers of variables. Three NIR datasets and three variable selection methods including two widely-used methods (competitive adaptive reweighted sampling, CARS and genetic algorithm-interval partial least squares, GA-iPLS) and one hybrid method (variable importance in projection coupled with genetic algorithm, VIP -GA) were used to investigate the improvement of VCPA-based hybrid strategy. The results show that VCPA-GA and VCPA-IRIV significantly improve model's prediction performance when compared with other methods, indicating that the modified VCPA step is a very efficient way to filter the uninformative variables and VCPA-based hybrid strategy is a good and promising strategy for variable selection in NIR. The MATLAB source codes of VCPA-GA and VCPA-IRIV can be freely downloaded in the website: https://cn.mathworks.com/matlabcentral/profile/authors/5526470-yonghuan-yun. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:58 / 69
页数:12
相关论文
共 50 条
  • [21] Epistasis-based FSA: Two versions of a novel approach for variable selection in multivariate calibration
    de Paula, Lauro C. M.
    Soares, Anderson S.
    Soares, Telma W.
    Junior, Celso G. C.
    Coelho, Clarimar J.
    de Oliveira, Anselmo E.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 81 : 213 - 222
  • [22] A Novel Variable Selection Method Based on Binning-Normalized Mutual Information for Multivariate Calibration
    Zhong, Liang
    Huang, Ruiqi
    Gao, Lele
    Yue, Jianan
    Zhao, Bing
    Nie, Lei
    Li, Lian
    Wu, Aoli
    Zhang, Kefan
    Meng, Zhaoqing
    Cao, Guiyun
    Zhang, Hui
    Zang, Hengchang
    MOLECULES, 2023, 28 (15):
  • [23] PLS pruning: a new approach to variable selection for multivariate calibration based on Hessian matrix of errors
    Lima, SLT
    Mello, C
    Poppi, RJ
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2005, 76 (01) : 73 - 78
  • [24] Variable selection using shrinkage priors
    Li, Hanning
    Pati, Debdeep
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 107 : 107 - 119
  • [25] A variable selection strategy for supervised classification with continuous spectroscopic data
    Indahl, U
    Næs, T
    JOURNAL OF CHEMOMETRICS, 2004, 18 (02) : 53 - 61
  • [26] An advanced variable selection method based on information gain and Fisher criterion reselection iteration for multivariate calibration
    Liu, Hubin
    Yuan, Yuhui
    Wang, Ge
    Xu, Weijie
    Zhao, Longlian
    Li, Junhui
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2023, 235
  • [27] Multivariate Chaotic Time Series Prediction Based on ELM–PLSR and Hybrid Variable Selection Algorithm
    Min Han
    Ruiquan Zhang
    Meiling Xu
    Neural Processing Letters, 2017, 46 : 705 - 717
  • [28] Adaptive Variable Re-weighting and Shrinking Approach for Variable Selection in Multivariate Calibration for Near-infrared Spectroscopy
    Sun Jing-Jing
    Yang Wu-De
    Feng Mei-Chen
    Xiao Lu-Jie
    Sun Hui
    Kubar, Muhammad-Saleem
    CHINESE JOURNAL OF ANALYTICAL CHEMISTRY, 2021, 49 (05) : E21079 - E21086
  • [29] Variable selection and validation in multivariate modelling
    Shi, Lin
    Westerhuis, Johan A.
    Rosen, Johan
    Landberg, Rikard
    Brunius, Carl
    BIOINFORMATICS, 2019, 35 (06) : 972 - 980
  • [30] Multivariate Bayesian variable selection and prediction
    Brown, PJ
    Vannucci, M
    Fearn, T
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1998, 60 : 627 - 641