Impact of Clustering on a Synthetic Instance Generation in Imbalanced Data Streams Classification

被引:1
|
作者
Czarnowski, Ireneusz [1 ]
Martins, Denis Mayr Lima [2 ]
机构
[1] Gdynia Maritime Univ, Dept Informat Syst, Morska 83, PL-81225 Gdynia, Poland
[2] Univ Munster, Dept Informat Syst, ERCIS, Leonardo Campus 3, D-48149 Munster, Germany
关键词
Classification; Learning from data streams; Imbalanced data; Over-sampling; Clustering; SMOTE;
D O I
10.1007/978-3-031-08754-7_63
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The goal of the paper is to propose a new version of the Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI) algorithm. This paper describes WECOI and presents the alternative approach for over-sampling, which is based on a selection of reference instances from produced clusters. This approach is flexible on applied clustering methods; however, the similarity-based clustering algorithm has been proposed as a core. For clustering, different methods may also be applied. The proposed approach has been validated experimentally using different clustering methods and shows how the clustering technique may influence synthetic instance generation and the performance of WECOI. The WECOI approach has also been compared with other algorithms dedicated to learning from imbalanced data streams. The computational experiment was carried out using several selected benchmark datasets. The computational experiment results are presented and discussed.
引用
收藏
页码:586 / 597
页数:12
相关论文
共 50 条
  • [21] A semi-supervised clustering-based classification model for classifying imbalanced data streams in the presence of scarcely labelled data
    Bhowmick K.
    Narvekar M.
    International Journal of Business Intelligence and Data Mining, 2022, 20 (02) : 170 - 191
  • [22] Imbalanced Data Classification Using SVM Based on Improved Simulated Annealing Featuring Synthetic Data Generation and Reduction
    Hussein, Hussein Ibrahim
    Anwar, Said Amirul
    Ahmad, Muhammad Imran
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (01): : 547 - 564
  • [23] An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT I, 2019, 11683 : 601 - 610
  • [24] Anomaly Detection Aided Budget Online Classification for Imbalanced Data Streams
    Liang, Xijun
    Song, Xiaoxin
    Qi, Kai
    Liu, Jinyu
    Jian, Ling
    Li, Jundong
    IEEE INTELLIGENT SYSTEMS, 2021, 36 (03) : 14 - 22
  • [25] Instance selection improves geometric mean accuracy: a study on imbalanced data classification
    Kuncheva, Ludmila I.
    Arnaiz-Gonzalez, Alvar
    Diez-Pastor, Jose-Francisco
    Gunn, Iain A. D.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2019, 8 (02) : 215 - 228
  • [26] Clustering-based incremental learning for imbalanced data classification
    Liu, Yuxin
    Du, Guangyu
    Yin, Chenke
    Zhang, Haichao
    Wang, Jia
    KNOWLEDGE-BASED SYSTEMS, 2024, 292
  • [27] Instance selection improves geometric mean accuracy: a study on imbalanced data classification
    Ludmila I. Kuncheva
    Álvar Arnaiz-González
    José-Francisco Díez-Pastor
    Iain A. D. Gunn
    Progress in Artificial Intelligence, 2019, 8 : 215 - 228
  • [28] Dynamic Classification Ensembles for Handling Imbalanced Multiclass Drifted Data Streams
    Madkour A.H.
    Abdelkader H.M.
    Mohammed A.M.
    Information Sciences, 2024, 670
  • [29] Clustering-based incremental learning for imbalanced data classification
    Liu, Yuxin
    Du, Guangyu
    Yin, Chenke
    Zhang, Hachao
    Wang, Jia
    Knowledge-Based Systems, 2024, 292
  • [30] IBLStreams: a system for instance-based classification and regression on data streams
    Shaker, Ammar
    Huellermeier, Eyke
    EVOLVING SYSTEMS, 2012, 3 (04) : 235 - 249