Bootstrapping soft shrinkage variable selection method based on the combination of frequency and regression coefficient

被引:0
|
作者
Zhang F. [1 ]
Tang X. [1 ]
Tong A. [1 ]
Wang B. [1 ]
Wang J. [1 ]
机构
[1] State Key Laboratory of Electrical Insulation & Power Equipment, Xi'an Jiaotong University, Xi'an
关键词
Near infrared spectroscopy; Partial least square; Wavelength selection; Weighted bootstrap sampling;
D O I
10.19650/j.cnki.cjsi.J1905728
中图分类号
学科分类号
摘要
Aiming at the problems that the spectral lines obtained using Fourier transform infrared spectrometer are enormous, and directly using all the spectral lines to perform multiple linear regression easily leads to over-fitting, poor stability and long analysis period. In this paper, a bootstrap soft shrinkage variable selection method based on the combination of frequency and regression coefficient is proposed. This method selects the variables based on the weight of the variables; in each iterative process, the new weight of the variable is calculated according to the regression coefficient and frequency of the variable, and the soft shrinkage of the variables is realized through weighted bootstrap sampling technology. The method was verified using the infrared spectrum datasets of corn. On the corn oil dataset, the root mean square error of prediction (RMSEP) and correlation coefficients (Rp) are 0.020 2 and 0.976 5, respectively, the number of variables is reduced from the original 700 to 13. On the corn protein dataset, the RMSEP and Rp are 0.027 9 and 0.996 8, respectively, the number of variables is reduced from the original 700 to 16. The result shows that the proposed variable selection algorithm can select fewer and more precise variables, and has practical application value. © 2020, Science Press. All right reserved.
引用
收藏
页码:64 / 70
页数:6
相关论文
共 22 条
  • [1] Wang Z.H., Chen C., Qian C.H., Et al., Spectrometer wavelength error correction method based on particle swarm optimization, Chinese Journal of Scientific Instrument, 38, 10, pp. 2430-2436, (2017)
  • [2] Tang X.J., Li Y.J., Zhu L.J., Et al., On-line multi-component alkane mixture quantitative analysis using Fourier transform infrared spectrometer, Chemometrics and Intelligent Laboratory Systems, 146, pp. 371-377, (2015)
  • [3] Liu L.Y., Zheng F., Zhang G.Y., Et al., Development of solar spectroradiometer for meteorological observation, Instrumentation, 4, 1, pp. 1-8, (2017)
  • [4] Han J., Li X.Z., Cao Z.M., Et al., Ultra-sparse representation method for measuring crude oil water content using infrared spectroscopy technique, Chinese Journal of Scientific Instrument, 40, 6, pp. 78-85, (2019)
  • [5] Huan K.W., Liu X.X., Zheng F., Et al., Selection of variables for wheat protein near infrared spectroscopy based on Monte Carlo characteristic projection, Journal of Agricultural Engineering, 29, 4, pp. 266-271, (2013)
  • [6] Yun Y.H., Wang W.T., Deng B.C., Et al., Using variable combination population analysis for variable selection in multivariate calibration, Analytica Chimica Acta, 862, pp. 14-23, (2015)
  • [7] Chen J., Yang C., Zhu H., Et al., A novel variable selection method based on stability and variable permutation for multivariate calibration, Chemometrics and Intelligent Laboratory Systems, 182, pp. 188-201, (2018)
  • [8] Hong M.J., Wen Z.Y., Zhang X.H., New wavelength selection algorithm based on sparse optimization, Chinese Journal of Scientific Instrument, 32, 5, pp. 1114-1118, (2011)
  • [9] Norgaard L., Saudland A., Wagner J., Et al., Interval partial least-squares regression(iPLS): A comparative chemometric study with an example from near-infrared spectroscopy, Applied Spectroscopy, 54, 3, pp. 413-419, (2000)
  • [10] Jiang J.H., Berry R.J., Siesler H.W., Et al., Wavelength interval selection in multi-component spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data, Analytical Chemistry, 74, 14, pp. 3555-3565, (2002)