Ant-Based Feature and Instance Selection for Multiclass Imbalanced Data

被引:0
|
作者
Villuendas-Rey, Yenny [1 ]
Yanez-Marquez, Cornelio [2 ]
Camacho-Nieto, Oscar [1 ]
机构
[1] Inst Politecn Nacl, Ctr Innovac & Desarrollo Tecnol Computo, Mexico City 07700, Mexico
[2] Inst Politecn Nacl, Ctr Invest Comp, Mexico City 07738, Mexico
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Rough sets; Classification algorithms; Training; Metadata; Metaheuristics; Information systems; Nearest neighbor methods; Ant colony optimization; Algorithm design and theory; Multiclass imbalanced data; feature selection; instance selection; nearest neighbor; EVOLUTIONARY INSTANCE; ALGORITHMS; INFORMATION; SOFTWARE; SETS;
D O I
10.1109/ACCESS.2024.3418669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a novel algorithm called Ant-based Feature and Instance Selection. This new algorithm addresses the simultaneous selection of instances and features for mixed, incomplete, and imbalanced data in the context of lazy instance-based classifiers. The proposed algorithm uses a hybrid selection strategy based on metaheuristic procedures and Rough Sets. The Ant-based Feature and Instance Selection algorithm combines Ant Colony Optimization and Generic Extended Rough Sets for Mixed and Incomplete Information Systems. It has five stages: reduct computation, metadata computation, intelligent instance preprocessing, submatrices creation, and fusion. To test the performance of the proposed algorithm, we used 25 datasets from the Machine Learning repository of the University of California at Irvine. All these datasets are imbalanced, with multiple classes and represent real-world classification problems. The number of classes ranges between three and eight classes. Most of them also have mixed or incomplete descriptions. We used several performance measures and computed the Instance Retention ratio and the Feature Retention ratio. To determine the existence or not of significant differences in the performance of the compared algorithms, we used non-parametric hypothesis testing. The statistical analysis results confirm the high quality of the proposed algorithm for selecting features and instances in multiclass imbalanced data.
引用
收藏
页码:133952 / 133968
页数:17
相关论文
共 50 条
  • [1] Iterative ensemble feature selection for multiclass classification of imbalanced microarray data
    Yang, Junshan
    Zhou, Jiarui
    Zhu, Zexuan
    Ma, Xiaoliang
    Ji, Zhen
    JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2016, 23
  • [2] FISA: Feature-based instance selection for imbalanced text classification
    Sun, Aixin
    Lim, Ee-Peng
    Benatallah, Boualem
    Hassan, Mahbub
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 250 - 254
  • [3] Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
    Liu, Zhenqiu
    Bensmail, Halima
    Tan, Ming
    EVOLUTIONARY BIOINFORMATICS, 2012, 8 : 197 - 205
  • [4] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
  • [5] Feature Selection in Imbalanced Data
    Kamalov F.
    Thabtah F.
    Leung H.H.
    Annals of Data Science, 2023, 10 (06) : 1527 - 1541
  • [6] A Classification Method Based on Feature Selection for Imbalanced Data
    Liu, Yi
    Wang, Yanzhen
    Ren, Xiaoguang
    Zhou, Hao
    Diao, Xingchun
    IEEE ACCESS, 2019, 7 : 81794 - 81807
  • [7] Imbalanced Data Classification Based on Feature Selection Techniques
    Ksieniewicz, Pawel
    Wozniak, Michal
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 296 - 303
  • [8] Undersampling Instance Selection for Hybrid and Incomplete Imbalanced Data
    Camacho-Nieto, Oscar
    Yanez-Marquez, Cornelio
    Villuendas-Rey, Yenny
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2020, 26 (06) : 698 - 719
  • [9] Univariate feature selection on imbalanced data
    Chatterjee, Avishek
    Woodruff, Henry
    Lobbes, Marc
    Vallieres, Martin
    Seuntjens, Jan
    MEDICAL PHYSICS, 2019, 46 (11) : 5375 - 5375
  • [10] Causal Feature Selection With Imbalanced Data
    Ling, Zhaolong
    Wu, Jingxuan
    Zhang, Yiwen
    Zhou, Peng
    Yu, Kui
    Jiang, Bingbing
    Wu, Xindong
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,