Ant-Based Feature and Instance Selection for Multiclass Imbalanced Data

被引:0
|
作者
Villuendas-Rey, Yenny [1 ]
Yanez-Marquez, Cornelio [2 ]
Camacho-Nieto, Oscar [1 ]
机构
[1] Inst Politecn Nacl, Ctr Innovac & Desarrollo Tecnol Computo, Mexico City 07700, Mexico
[2] Inst Politecn Nacl, Ctr Invest Comp, Mexico City 07738, Mexico
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Rough sets; Classification algorithms; Training; Metadata; Metaheuristics; Information systems; Nearest neighbor methods; Ant colony optimization; Algorithm design and theory; Multiclass imbalanced data; feature selection; instance selection; nearest neighbor; EVOLUTIONARY INSTANCE; ALGORITHMS; INFORMATION; SOFTWARE; SETS;
D O I
10.1109/ACCESS.2024.3418669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a novel algorithm called Ant-based Feature and Instance Selection. This new algorithm addresses the simultaneous selection of instances and features for mixed, incomplete, and imbalanced data in the context of lazy instance-based classifiers. The proposed algorithm uses a hybrid selection strategy based on metaheuristic procedures and Rough Sets. The Ant-based Feature and Instance Selection algorithm combines Ant Colony Optimization and Generic Extended Rough Sets for Mixed and Incomplete Information Systems. It has five stages: reduct computation, metadata computation, intelligent instance preprocessing, submatrices creation, and fusion. To test the performance of the proposed algorithm, we used 25 datasets from the Machine Learning repository of the University of California at Irvine. All these datasets are imbalanced, with multiple classes and represent real-world classification problems. The number of classes ranges between three and eight classes. Most of them also have mixed or incomplete descriptions. We used several performance measures and computed the Instance Retention ratio and the Feature Retention ratio. To determine the existence or not of significant differences in the performance of the compared algorithms, we used non-parametric hypothesis testing. The statistical analysis results confirm the high quality of the proposed algorithm for selecting features and instances in multiclass imbalanced data.
引用
收藏
页码:133952 / 133968
页数:17
相关论文
共 50 条
  • [41] On the performance of ant-based clustering
    Handl, J
    Knowles, J
    Dorigo, M
    DESIGN AND APPLICATION OF HYBRID INTELLIGENT SYSTEMS, 2003, 104 : 204 - 213
  • [42] An Ant-Based Algorithm for Clustering
    Elkamel, Akil
    Gzara, Mariem
    Jamoussi, Salma
    Ben-Abdallah, Hanene
    2009 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2009, : 76 - +
  • [43] Stable feature selection and classification algorithms for multiclass microarray data
    Student, Sebastian
    Fujarewicz, Krzysztof
    BIOLOGY DIRECT, 2012, 7
  • [44] Adaptive Data Structure Regularized Multiclass Discriminative Feature Selection
    Fan, Mingyu
    Zhang, Xiaoqin
    Hu, Jie
    Gu, Nannan
    Tao, Dacheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5859 - 5872
  • [45] Feature Selection for Multiclass Problems Based on Information Weights
    Georgiev, George
    Valova, Iren
    Gueorguieva, Natacha
    COMPLEX ADAPTIVE SYSTEMS, 2011, 6
  • [46] Stable feature selection and classification algorithms for multiclass microarray data
    Sebastian Student
    Krzysztof Fujarewicz
    Biology Direct, 7
  • [47] MULTICLASS BAYESIAN FEATURE SELECTION
    Foroughi, Ali
    Dalton, Lori A.
    2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 725 - 729
  • [48] Instance selection improves geometric mean accuracy: a study on imbalanced data classification
    Kuncheva, Ludmila I.
    Arnaiz-Gonzalez, Alvar
    Diez-Pastor, Jose-Francisco
    Gunn, Iain A. D.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2019, 8 (02) : 215 - 228
  • [49] Instance selection improves geometric mean accuracy: a study on imbalanced data classification
    Ludmila I. Kuncheva
    Álvar Arnaiz-González
    José-Francisco Díez-Pastor
    Iain A. D. Gunn
    Progress in Artificial Intelligence, 2019, 8 : 215 - 228
  • [50] A Feature Selection Model for Binary Classification of Imbalanced Data Based on Preference for Target Instances
    Tan, Ding-Wen
    Liew, Soung-Yue
    Tan, Teik-Boon
    Yeoh, William
    2012 4TH CONFERENCE ON DATA MINING AND OPTIMIZATION (DMO), 2012, : 35 - 42