Multisource Latent Feature Selective Ensemble Modeling Approach for Small-Sample High-Dimensional Process Data in Applications

被引:2
|
作者
Tang, Jian [1 ,2 ]
Zhang, Jian [3 ]
Yu, Gang [4 ]
Zhang, Wenping [5 ]
Yu, Wen [6 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[2] Beijing Key Lab Computat Intelligence & Intellige, Beijing 100124, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Peoples R China
[4] State Beijing Key Lab Proc Automat Min & Met, Beijing 102600, Peoples R China
[5] Shandong Gold Min Technol Co Ltd, Met Lab Branch, Jinan 250014, Peoples R China
[6] CINVESTAV IPN Natl Polytech Inst, Dept Control Automat, Mexico City 07360, DF, Mexico
基金
美国国家科学基金会; 北京市自然科学基金;
关键词
Feature extraction; Data models; Adaptation models; Pollution measurement; Training; Data mining; Analytical models; Multisource feature extraction; multi-layered feature selection; selective ensemble modeling; hyperparameter selection; high dimensional process data; EXTREME LEARNING-MACHINE; OPTIMIZATION; PARAMETERS; VIBRATION; SIZE;
D O I
10.1109/ACCESS.2020.3015875
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several difficult-to-measure production qualities or environment pollution indices of industrial process must be measured using offline laboratory instruments. Soft measurement method is often used to perform online prediction of such parameters. Only small-sample modeling data with high-dimensional input features can be obtained due to the limitations and complex characteristics of the measurement device and process, respectively. Therefore, a new multisource latent feature selective ensemble (SEN) modeling approach is proposed in this study. First, input features are divided into different subgroups according to the characteristics of the modeling data. Second, the extracted multisource latent features evolve from the multi-layered selection algorithms, which are specified by feature reduction ratio, feature contribution ratio and mutual information value orderly for each subgroup. Finally, in order to construct candidate sub-models, an adaptive hyper-parameter selection algorithm based on the multi-step grid search is employed in terms of the reduced features. Sequentially, the optimized ensemble submodels with their weighting strategies are adaptively determined to build the final SEN model. The proposed method is verified by using benchmark near-infrared data, high dimensional mechanical frequency spectrum data and industrial dioxin emission concentration data.
引用
收藏
页码:148475 / 148488
页数:14
相关论文
共 50 条
  • [31] Feature selection for small sample sets with high dimensional data using heuristic hybrid approach
    Biglari M.
    Mirzaei F.
    Hassanpour H.
    International Journal of Engineering, Transactions B: Applications, 2020, 33 (02): : 213 - 220
  • [32] Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains
    Pes, Barbara
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (10): : 5951 - 5973
  • [33] MMLF: Multi-Metric Latent Feature Analysis for High-Dimensional and Incomplete Data
    Wu, Di
    Zhang, Peng
    He, Yi
    Luo, Xin
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (02) : 575 - 588
  • [34] Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains
    Barbara Pes
    Neural Computing and Applications, 2020, 32 : 5951 - 5973
  • [35] High-dimensional feature selection in competing risks modeling: A stable approach using a split-and-merge ensemble algorithm
    Sun, Han
    Wang, Xiaofeng
    BIOMETRICAL JOURNAL, 2023, 65 (02)
  • [36] A Recurrent Latent Variable Model for Supervised Modeling of High-Dimensional Sequential Data
    Christodoulou, Panayiotis
    Chatzis, Sotirios P.
    Andreou, Andreas S.
    2018 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2018,
  • [37] Multi-kernel Gaussian process latent variable regression model for high-dimensional sequential data modeling
    Zhu, Ziqi
    Zhang, Jiayuan
    Zou, Jixin
    Deng, Chunhua
    NEUROCOMPUTING, 2019, 348 : 3 - 15
  • [38] Functional Modeling of High-Dimensional Data: A Manifold Learning Approach
    Hernandez-Roig, Harold A.
    Aguilera-Morillo, M. Carmen
    Lillo, Rosa E.
    MATHEMATICS, 2021, 9 (04) : 1 - 22
  • [39] High-dimensional categorical process monitoring: A data mining approach
    Wang, Kai
    Song, Zhenli
    IISE TRANSACTIONS, 2024,
  • [40] A robust ensemble feature selection approach to prioritize genes associated with survival outcome in high-dimensional gene expression data
    Le, Phi
    Gong, Xingyue
    Ung, Leah
    Yang, Hai
    Keenan, Bridget P.
    Zhang, Li
    He, Tao
    FRONTIERS IN SYSTEMS BIOLOGY, 2024, 4