Clustering-Guided Particle Swarm Feature Selection Algorithm for High-Dimensional Imbalanced Data With Missing Values

被引:59
|
作者
Zhang, Yong [1 ]
Wang, Yan-Hu [1 ]
Gong, Dun-Wei [1 ]
Sun, Xiao-Yan [1 ]
机构
[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Class imbalance; feature selection (FS); fuzzy clustering; missing value; particle swarm optimization (PSO); SENSITIVE FEATURE-SELECTION; MUTUAL INFORMATION; DIFFERENTIAL EVOLUTION; GENETIC ALGORITHM; OPTIMIZATION; CLASSIFICATION; MACHINE;
D O I
10.1109/TEVC.2021.3106975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of the corresponding FS algorithm. Due to the complex coupling relationship between missing data and class imbalance, the need for better FS method becomes essential. To tackle high-dimensional imbalanced data with missing values, this article studies a new evolutionary FS method. First, an improved F-measure based on filling risk (RF-measure) is defined to evaluate the influence of missing data on the performance of FS in the case of class imbalance. Following that taking the RF-measure as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is proposed. Two new problem-specific operators or strategies, i.e., the swarm initialization strategy guided by fuzzy clustering and the local pruning operator based on feature importance, are developed to improve the performance of PSOFS-FC. Compared with state-of-the-art FS algorithms on several public datasets, experimental results show that PSOFS-FC can achieve excellent classification performance with relatively less running time, indicating its superiority on tackling high-dimensional imbalanced data with missing values.
引用
收藏
页码:616 / 630
页数:15
相关论文
共 50 条
  • [11] A density-based clustering algorithm for high-dimensional data with feature selection
    Qi Xianting
    Wang Pan
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 114 - 118
  • [12] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [13] A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data
    Esmin, Ahmed A. A.
    Coelho, Rodrigo A.
    Matwin, Stan
    ARTIFICIAL INTELLIGENCE REVIEW, 2015, 44 (01) : 23 - 45
  • [14] A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data
    Ahmed A. A. Esmin
    Rodrigo A. Coelho
    Stan Matwin
    Artificial Intelligence Review, 2015, 44 : 23 - 45
  • [15] Particle swarm optimizer for automatically clustering high-dimensional data
    Lu, Yanping
    Xu, Suping
    Gao, Xing
    International Review on Computers and Software, 2012, 7 (03): : 1004 - 1011
  • [16] An Asymmetric Chaotic Competitive Swarm Optimization Algorithm for Feature Selection in High-Dimensional Data
    Pichai, Supailin
    Sunat, Khamron
    Chiewchanwattana, Sirapat
    SYMMETRY-BASEL, 2020, 12 (11): : 1 - 13
  • [17] Particle swarm optimization algorithm based on comprehensive scoring framework for high-dimensional feature selection
    Wei, Bo
    Yang, Shanshan
    Zha, Wentao
    Deng, Li
    Huang, Jiangyi
    Su, Xiaohui
    Wang, Feng
    SWARM AND EVOLUTIONARY COMPUTATION, 2025, 95
  • [18] Investigation on particle swarm optimisation for feature selection on high-dimensional data: local search and selection bias
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    Su Nguyen
    CONNECTION SCIENCE, 2016, 28 (03) : 270 - 294
  • [19] Online feature selection for high-dimensional class-imbalanced data
    Zhou, Peng
    Hu, Xuegang
    Li, Peipei
    Wu, Xindong
    KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 187 - 199
  • [20] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Song, Qinbao
    Ni, Jingjie
    Wang, Guangtao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14