ASE: Anomaly scoring based ensemble learning for highly imbalanced datasets

被引:6
|
作者
Liang, Xiayu [1 ]
Gao, Ying [1 ]
Xu, Shanrong [1 ]
机构
[1] South China Univ Technol, Guangzhou 510006, Peoples R China
关键词
Ensemble learning; Imbalanced datasets; Resampling; Anomaly detection; Bagging; SAMPLING METHOD; DATA-SETS; CLASSIFICATION; SMOTE; STACKING;
D O I
10.1016/j.eswa.2023.122049
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, many classification algorithms have been applied to various industries to help them work out their problems met in real-life scenarios. However, in many binary classification tasks, samples in the minority class only make up a small part of all instances, which leads to the datasets we get usually suffer from high imbalance ratio. Existing models sometimes treat minority classes as noise or ignore them as outliers encountering data skewing. In order to solve this problem, we propose a bagging ensemble learning framework ASE (Anomaly Scoring Based Ensemble Learning). This framework has a scoring system based on anomaly detection algorithms which can guide the resampling strategy by divided samples in the majority class into subspaces. Then specific number of instances will be under-sampled from each subspace to construct subsets by combining with the minority class. And we calculate the weights of base classifiers trained by the subsets according to the classification result of the anomaly detection model and the statistics of the subspaces. Experiments have been conducted which show that our ensemble learning model can dramatically improve the performance of base classifiers and is more efficient than other existing methods under a wide range of imbalance ratio, data scale and data dimension. ASE can be combined with various classifiers and every part of our framework has been proved to be reasonable and necessary.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Certainty-based active learning for sampling imbalanced datasets
    Fu, JuiHsi
    Lee, SingLing
    NEUROCOMPUTING, 2013, 119 : 350 - 358
  • [32] An ensemble learning approach for anomaly detection in credit card data with imbalanced and overlapped classes
    Islam, Md Amirul
    Uddin, Md Ashraf
    Aryal, Sunil
    Stea, Giovanni
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2023, 78
  • [33] DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets
    Kaya, Ersin
    Korkmaz, Sedat
    Sahman, Mehmet Akif
    Cinar, Ahmet Cevahir
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
  • [34] Transfer Learning for Human Activity Recognition in Federated Learning on Android Smartphones with Highly Imbalanced Datasets
    Osorio, Alexandre Freire
    Grassiotto, Fabio
    Moraes, Saulo Aldighieri
    Munoz, Amparo
    Gomes Neto, Sildolfo Francisco
    Gibaut, Wandemberg
    2024 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, ISCC 2024, 2024,
  • [35] Imbalanced Learning in Massive Phishing Datasets
    Azari, Ali
    Namayanja, Josephine M.
    Kaur, Navneet
    Misal, Vasundhara
    Shukla, Suraksha
    2020 IEEE 6TH INT CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / 6TH IEEE INT CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, (HPSC) / 5TH IEEE INT CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2020, : 127 - 132
  • [36] Classifying Depression in Imbalanced Datasets using an Autoencoder-Based Anomaly Detection Approach
    Gerych, Walter
    Agu, Emmanuel
    Rundensteiner, Elke
    2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 124 - 127
  • [37] An Empirical Study on Anomaly Detection Algorithms for Extremely Imbalanced Datasets
    Fontes, Goncalo
    Matos, Luis Miguel
    Matta, Arthur
    Pilastri, Andre
    Cortez, Paulo
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2022, PART I, 2022, 646 : 85 - 95
  • [38] Clustering and Weighted Scoring in Geometric Space Support Vector Machine Ensemble for Highly Imbalanced Data Classification
    Ksieniewicz, Pawel
    Burduk, Robert
    COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 128 - 140
  • [39] Iterative minority oversampling and its ensemble for ordinal imbalanced datasets
    Wang, Ning
    Zhang, Zhong-Liang
    Luo, Xing-Gang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [40] Constructing support vector machine ensemble with segmentation for imbalanced datasets
    Li, Qian
    Yang, Bing
    Li, Yi
    Deng, Naiyang
    Jing, Ling
    NEURAL COMPUTING & APPLICATIONS, 2013, 22 : S249 - S256