Exploratory parallel hybrid sampling framework for imbalanced data classification

被引:0
|
作者
Zheng, Ming [3 ,4 ]
Zhao, Zhuo [3 ]
Wang, Fei [3 ]
Hu, Xiaowen [3 ]
Xu, Sheng [3 ,4 ]
Li, Wanggen [3 ]
Li, Tong [1 ,2 ]
机构
[1] Yunnan Agr Univ, Big Data Sch, Kunming 650201, Peoples R China
[2] Yunnan Agr Univ, Key Lab Crop Prod & Smart Agr Yunnan Prov, Kunming 650201, Peoples R China
[3] Anhui Normal Univ, Sch Comp & Informat, Wuhu 241002, Peoples R China
[4] Anhui Prov Key Lab Ind Intelligence Data Secur, Wuhu 241002, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Oversampling; Undersampling; Parallel hybrid sampling framework; Serial hybrid sampling frameworks; ENSEMBLE; SMOTE;
D O I
10.1016/j.engappai.2024.109428
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current engineering application scenarios often face the challenge of imbalanced data, hybrid sampling is an effective method to deal with the imbalanced data classification issue, which can avoid the issues of overfitting and mistakenly deleting useful majority samples when using oversampling approach and undersampling approach alone. However, at present most of the hybrid sampling approaches are implemented serially, and the implementation of oversampling and undersampling approaches alone will cause mutual interference and influence between them. This study proposes a parallel hybrid sampling framework based on the idea of parallel engineering and theoretically analyzes its superiority. The experimental results show that when applied to five classification algorithms with three performance evaluation metrics,the proposed framework outperforms the two mainstream hybrid sampling frameworks. Moreover, the proposed framework can effectively reduce the time consumption of hybrid sampling process.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Hybrid Sampling Technique for Imbalanced Android Malware Family Classification
    Kshamta Chauhan
    Ekta Gandotra
    SN Computer Science, 6 (3)
  • [32] Imbalanced Data Stream Classification Using Hybrid Data Preprocessing
    Bobowska, Barbara
    Klikowski, Jakub
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 402 - 413
  • [33] Noise-free sampling with majority framework for an imbalanced classification problem
    Firdausanti, Neni Alya
    Mendonca, Israel
    Aritsugi, Masayoshi
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (07) : 4011 - 4042
  • [34] An Active Under-sampling Approach for Imbalanced Data Classification
    Yang, Zeping
    Gao, Daqi
    2012 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2012), VOL 2, 2012, : 270 - 273
  • [35] DOSS: Dual Over Sampling Strategy for Imbalanced Data Classification
    Wang, Qiushi
    Lee, Kee Jin
    Hong, Jihoon
    IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, : 5389 - 5394
  • [36] Imbalanced data classification: Using transfer learning and active sampling
    Liu, Yang
    Yang, Guoping
    Qiao, Shaojie
    Liu, Meiqi
    Qu, Lulu
    Han, Nan
    Wu, Tao
    Yuan, Guan
    Peng, Yuzhong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [37] Combine Sampling Support Vector Machine for Imbalanced Data Classification
    Sain, Hartayuni
    Purnami, Santi Wulan
    THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 59 - 66
  • [38] Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning
    Tyagi, Shivani
    Mittal, Sangeeta
    PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 209 - 221
  • [39] A Dynamic Sampling Framework for Multi-Class Imbalanced Data
    Debowski, B.
    Areibi, S.
    Grewal, G.
    Tempelman, J.
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 113 - 118
  • [40] Deep Over-sampling Framework for Classifying Imbalanced Data
    Ando, Shin
    Huang, Chun Yuan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 770 - 785