Research on Feature Selection Based on Hybrid Evolutionary Algorithm

被引:0
|
作者
Gao H.-M. [1 ]
Wang Y.-H. [2 ]
Bian C. [1 ]
Li X.-T. [1 ]
机构
[1] School of Artificial Intelligence, Jilin University, Jilin, Changchun
[2] School of Artificial Intelligence, Hebei University of Technology, Tianjin
来源
基金
中国国家自然科学基金;
关键词
classification; feature selection; gene expression data; local search; new Wrapper hybrid feature selection algorithm; teaching and learning-based optimization algorithm;
D O I
10.12263/DZXB.20210399
中图分类号
学科分类号
摘要
Feature selection (FS) is an effective data pre-processing method that solves the dimensionality disaster caused by data redundancy by selecting a set of features with high relevance and low redundancy in high-dimensional data. Many computational methods have been applied to solve the FS problem, among which the teaching and learning-based optimization algorithm (TLBO) feature selection model has received increasing attention from scholars due to its efficient global search capability. However, with the increasing size of data, the limitations of these algorithms, such as model instability, low model accuracy and poor local search ability, have gradually put the research of the algorithms into difficulties. To address these problems, this paper proposes a hybrid evolutionary Wrapper algorithm model (Teaching and Learning-Based Optimization- Local Search algorithm,TLBOLS) that integrates teaching-learning optimization algorithms with local search methods. Firstly, the algorithm converts the real-type coding to binary coding in the initialization phase, then introduces the worst individual restart mechanism in the teaching phase, and proposes a binary teaching-learning feature selection algorithm for the evolutionary class process using different values of TF values for the two identities of learners and pedagogues (Binary Teaching and Learning-Based Optimization- Local Search algorithm, BTLBOLS). Subsequently, a local search method combining multiple operations and variable neighborhood search is proposed to gradually enhance the perturbation strength and improve the individual quality of the whole population. To optimize the feature selection results, BTLBOLS utilizes a comprehensive evaluation metric as an objective function to guide the overall evolutionary process. Forty-five high-dimensional cancer gene expression datasets are selected for testing and compared with ten feature selection algorithms, and the experimental results show that compared to other algorithms, the BTLBOLS has certain advantages in terms of classification accuracy and number of features, which effectively improves the algorithm classification performance. © 2023 Chinese Institute of Electronics. All rights reserved.
引用
收藏
页码:1619 / 1636
页数:17
相关论文
共 45 条
  • [1] ZHOU Z H., Machine Learning, (2021)
  • [2] HALL M A., Correlation-Based Feature Selection Forma-chine Learning, (1999)
  • [3] EFRON B, HASTIE T, JOHNSTONE I, Et al., Least angle regression, The Annals of Statistics, 32, 2, pp. 407-451, (2004)
  • [4] TIBSHIRANI R, HASTIE T, NARASHMAN B, Et al., Diagnosis of multiple cancer types by shrunken centroids of gene expression, Proceedings of the National Academy of Sciences of the United States of America, 99, 10, pp. 6567-6572, (2002)
  • [5] ROBNIK-SIKONJA M, KONONENKO I., Theoretical and empirical analysis of ReliefF and RReliefF, Machine Learning, 53, 1, pp. 23-69, (2003)
  • [6] DING C, PENG H C., Minimum redundancy feature selection from microarray gene expression data, Journal of Bioinformatics and Computational Biology, 3, 2, pp. 185-205, (2005)
  • [7] PENG H C, LONG F H, DING C., Feature selection based on mutual information criteria of max-dependency, max-relevance, and Min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 8, pp. 1226-1238, (2005)
  • [8] YEOH E J, ROSS M E, SHURTLEFF S A, Et al., Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, 1, 2, pp. 133-143, (2002)
  • [9] ZHANG J Y, LIU S L, WANG Y., Gene association study with SVM, MLP and cross-validation for the diagnosis of diseases, Progress in Natural Science, 18, 6, pp. 741-750, (2008)
  • [10] GUYON I, ELISSEEFF A., An introduction to variable and feature selection, Journal of Machine Learning Research, 3, pp. 1157-1182, (2003)