Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis

被引:0
|
作者
Bano, Shahzadi [1 ]
Zhi, Weimei [1 ]
Qiu, Baozhi [1 ]
Raza, Muhammad [2 ]
Sehito, Nabila [3 ]
Kamal, Mian Muhammad [4 ]
Aldehim, Ghadah [5 ]
Alruwais, Nuha [6 ]
机构
[1] Zhengzhou Univ, Sch Comp & Artificial Intelligence, 100 Sci Ave, Zhengzhou 450001, Peoples R China
[2] Xian Technol Univ, Xian, Peoples R China
[3] Zhengzhou Univ, Sch Elect Informat Engn, 100 Sci Ave, Zhengzhou 450001, Henan, Peoples R China
[4] Southeast Univ, Sch Elect Sci & Engn, Joint Int Res Lab Informat Display & Visualizat, Nanjing 210018, Peoples R China
[5] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Syst, POB 84428, Riyadh 11671, Saudi Arabia
[6] King Saud Univ, Coll Appl Studies & Community Serv, Dept Comp Sci & Engn, POB 22459, Riyadh 11495, Saudi Arabia
来源
JOURNAL OF SUPERCOMPUTING | 2024年 / 80卷 / 07期
关键词
Self-paced ensemble; Big data; Classification; Computational; Simulation; Substantial imbalance;
D O I
10.1007/s11227-023-05828-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This research paper focuses on the challenges associated with learning classifiers from large-scale, highly imbalanced datasets prevalent in many real-world applications. Traditional algorithms learning often need better performance and high computational efficiency when dealing with imbalanced data. Factors such as class imbalance, noise, and class overlap make it demanding to learn effective classifiers. In this study, we propose a novel self-paced ensemble framework for classifying imbalanced data. The framework employs under-sampling to self-harmonize data hardness and build a robust ensemble. Extensive experimental testing demonstrates promising results in handling overlapping classes and skewed distributions while maintaining computational efficiency. The self-paced ensemble method addresses the challenges of high imbalance ratios, class overlap, and noise presence in large-scale imbalanced classification problems. By incorporating the knowledge of these challenges into our learning framework, we establish the concept of classification hardness distribution, and the self-paced ensemble is a revolutionary learning paradigm for massive imbalance categorization, capable of improving the performance of existing learning algorithms on imbalanced data and providing better results for future applications.
引用
收藏
页码:9848 / 9869
页数:22
相关论文
共 50 条
  • [1] Self-paced ensemble and big data identification: a classification of substantial imbalance computational analysis
    Shahzadi Bano
    Weimei Zhi
    Baozhi Qiu
    Muhammad Raza
    Nabila Sehito
    Mian Muhammad Kamal
    Ghadah Aldehim
    Nuha Alruwais
    The Journal of Supercomputing, 2024, 80 : 9848 - 9869
  • [2] Self-paced Ensemble for Highly Imbalanced Massive Data Classification
    Liu, Zhining
    Cao, Wei
    Gao, Zhifeng
    Bian, Jiang
    Chen, Hechang
    Chang, Yi
    Liu, Tie-Yan
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 841 - 852
  • [3] Self-paced ensemble learning for speech and audio classification
    Ristea, Nicolae-Catalin
    Ionescu, Radu Tudor
    INTERSPEECH 2021, 2021, : 2836 - 2840
  • [4] Self-Paced Clustering Ensemble
    Zhou, Peng
    Du, Liang
    Liu, Xinwang
    Shen, Yi-Dong
    Fan, Mingyu
    Li, Xuejun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (04) : 1497 - 1511
  • [5] Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification
    Zhou, Fang
    Gao, Suting
    Ni, Lyu
    Pavlovski, Martin
    Dong, Qiwen
    Obradovic, Zoran
    Qian, Weining
    DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (05) : 1601 - 1622
  • [6] Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification
    Fang Zhou
    Suting Gao
    Lyu Ni
    Martin Pavlovski
    Qiwen Dong
    Zoran Obradovic
    Weining Qian
    Data Mining and Knowledge Discovery, 2022, 36 : 1601 - 1622
  • [7] Active Clustering Ensemble With Self-Paced Learning
    Zhou, Peng
    Sun, Bicheng
    Liu, Xinwang
    Du, Liang
    Li, Xuejun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12186 - 12200
  • [8] Improved Classification Model for Peptide Identification Based on Self-paced Learning
    Wang, Yongxiang
    Liang, Xijun
    Xia, Zhonghang
    Niu, Xinnan
    Link, Andrew J.
    Yin, Haiqing
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 258 - 261
  • [9] Deep Reconciled and Self-Paced TSK Fuzzy System Ensemble for Imbalanced Data Classification: Architecture, Interpretability, and Theory
    Zhang, Yuanpeng
    Wang, Guanjin
    Zhou, Ta
    Ren, Ge
    Lam, Saikit
    Ding, Weiping
    Cai, Jing
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (11) : 6185 - 6198
  • [10] Self-paced Learning for Imbalanced Data
    Zieba, Maciej
    Tomczak, Jakub M.
    Swiatek, Jerzy
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT I, 2016, 9621 : 564 - 573