Impact of Clustering on a Synthetic Instance Generation in Imbalanced Data Streams Classification

被引:1
|
作者
Czarnowski, Ireneusz [1 ]
Martins, Denis Mayr Lima [2 ]
机构
[1] Gdynia Maritime Univ, Dept Informat Syst, Morska 83, PL-81225 Gdynia, Poland
[2] Univ Munster, Dept Informat Syst, ERCIS, Leonardo Campus 3, D-48149 Munster, Germany
关键词
Classification; Learning from data streams; Imbalanced data; Over-sampling; Clustering; SMOTE;
D O I
10.1007/978-3-031-08754-7_63
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The goal of the paper is to propose a new version of the Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI) algorithm. This paper describes WECOI and presents the alternative approach for over-sampling, which is based on a selection of reference instances from produced clusters. This approach is flexible on applied clustering methods; however, the similarity-based clustering algorithm has been proposed as a core. For clustering, different methods may also be applied. The proposed approach has been validated experimentally using different clustering methods and shows how the clustering technique may influence synthetic instance generation and the performance of WECOI. The WECOI approach has also been compared with other algorithms dedicated to learning from imbalanced data streams. The computational experiment was carried out using several selected benchmark datasets. The computational experiment results are presented and discussed.
引用
收藏
页码:586 / 597
页数:12
相关论文
共 50 条
  • [1] The impact of data difficulty factors on classification of imbalanced and concept drifting data streams
    Dariusz Brzezinski
    Leandro L. Minku
    Tomasz Pewinski
    Jerzy Stefanowski
    Artur Szumaczuk
    Knowledge and Information Systems, 2021, 63 : 1429 - 1469
  • [2] The impact of data difficulty factors on classification of imbalanced and concept drifting data streams
    Brzezinski, Dariusz
    Minku, Leandro L.
    Pewinski, Tomasz
    Stefanowski, Jerzy
    Szumaczuk, Artur
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (06) : 1429 - 1469
  • [3] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Chen, Zhi
    Lin, Tao
    Xia, Xin
    Xu, Hongyan
    Ding, Sha
    APPLIED INTELLIGENCE, 2018, 48 (08) : 2441 - 2457
  • [4] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745
  • [5] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Zhi Chen
    Tao Lin
    Xin Xia
    Hongyan Xu
    Sha Ding
    Applied Intelligence, 2018, 48 : 2441 - 2457
  • [6] MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    KNOWLEDGE-BASED SYSTEMS, 2015, 89 : 385 - 397
  • [7] Classification with local clustering in imbalanced data sets
    Ji, Hua
    Zhang, Huaxiang
    ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 151 - 155
  • [8] Classwise Clustering for Classification of Imbalanced Text Data
    Swarnalatha, K.
    Guru, D. S.
    Anami, Basavaraj S.
    Suhil, Mahamad
    EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY, ICERECT 2018, 2019, 545 : 83 - 94
  • [9] The prior probability in the batch classification of imbalanced data streams
    Ksieniewicz, Pawel
    NEUROCOMPUTING, 2021, 452 : 309 - 316
  • [10] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200