Bootstrapping soft shrinkage variable selection method based on the combination of frequency and regression coefficient

被引:0
|
作者
Zhang F. [1 ]
Tang X. [1 ]
Tong A. [1 ]
Wang B. [1 ]
Wang J. [1 ]
机构
[1] State Key Laboratory of Electrical Insulation & Power Equipment, Xi'an Jiaotong University, Xi'an
关键词
Near infrared spectroscopy; Partial least square; Wavelength selection; Weighted bootstrap sampling;
D O I
10.19650/j.cnki.cjsi.J1905728
中图分类号
学科分类号
摘要
Aiming at the problems that the spectral lines obtained using Fourier transform infrared spectrometer are enormous, and directly using all the spectral lines to perform multiple linear regression easily leads to over-fitting, poor stability and long analysis period. In this paper, a bootstrap soft shrinkage variable selection method based on the combination of frequency and regression coefficient is proposed. This method selects the variables based on the weight of the variables; in each iterative process, the new weight of the variable is calculated according to the regression coefficient and frequency of the variable, and the soft shrinkage of the variables is realized through weighted bootstrap sampling technology. The method was verified using the infrared spectrum datasets of corn. On the corn oil dataset, the root mean square error of prediction (RMSEP) and correlation coefficients (Rp) are 0.020 2 and 0.976 5, respectively, the number of variables is reduced from the original 700 to 13. On the corn protein dataset, the RMSEP and Rp are 0.027 9 and 0.996 8, respectively, the number of variables is reduced from the original 700 to 16. The result shows that the proposed variable selection algorithm can select fewer and more precise variables, and has practical application value. © 2020, Science Press. All right reserved.
引用
收藏
页码:64 / 70
页数:6
相关论文
共 22 条
  • [11] Du Y.P., Liang Y.Z., Jiang J.H., Et al., Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares, Analytica Chimica Acta, 501, 2, pp. 183-191, (2004)
  • [12] Leardi R., Lars N., Sequential application of backward interval PLS and genetic algorithms for the selection of relevant spectral regions, Journal of Chemometrics, 18, 11, pp. 486-497, (2004)
  • [13] Tu Z.H., Feng L., Sun L.J., Et al., Analysis and study of NIR characteristic wavelengths for honey water content, Chinese Journal of Scientific Instrument, 32, pp. 276-281, (2011)
  • [14] Jiang W.W., Lu C.H., Zhang Y.J., Et al., Research on maize component measurement of wavelength selection based on SiPLS and SPA, Journal of Electronic Measurement and Instrument, 31, 12, pp. 1960-1966, (2017)
  • [15] Song X.Z., Huang Y., Yan H., Et al., A novel algorithm for spectral interval combination optimization, Analytica Chimica Acta, 948, pp. 19-29, (2016)
  • [16] Han Q.J., Wu H.L., Cai C.B., Et al., An ensemble of Monte Carlo uninformative variable elimination for wavelength selection, Analytica Chimica Acta, 612, pp. 121-125, (2008)
  • [17] Li H., Liang Y., Xu Q., Et al., Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration, Analytica Chimica Acta, 648, 1, pp. 77-84, (2009)
  • [18] Leardi R., Amparo L.G., Genetic algorithms applied to feature selection in PLS regression: how and when to use them, Chemometrics and Intelligent Laboratory Systems, 41, 2, pp. 195-207, (1998)
  • [19] Chen Z., Zhang L.Q., Liu H.Y., Et al., Successive projections algorithm and its application to selecting the wheat near-infrared spectral variables, Spectroscopy and Spectral Analysis, 30, 4, pp. 949-952, (2010)
  • [20] Deng B.C., Yun Y.H., Cao D.S., Et al., A bootstrapping soft shrinkage approach for variable selection in chemical modeling, Analytica Chimica Acta, 908, pp. 63-74, (2016)