Impact of Clustering on a Synthetic Instance Generation in Imbalanced Data Streams Classification

被引:1
|
作者
Czarnowski, Ireneusz [1 ]
Martins, Denis Mayr Lima [2 ]
机构
[1] Gdynia Maritime Univ, Dept Informat Syst, Morska 83, PL-81225 Gdynia, Poland
[2] Univ Munster, Dept Informat Syst, ERCIS, Leonardo Campus 3, D-48149 Munster, Germany
关键词
Classification; Learning from data streams; Imbalanced data; Over-sampling; Clustering; SMOTE;
D O I
10.1007/978-3-031-08754-7_63
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The goal of the paper is to propose a new version of the Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI) algorithm. This paper describes WECOI and presents the alternative approach for over-sampling, which is based on a selection of reference instances from produced clusters. This approach is flexible on applied clustering methods; however, the similarity-based clustering algorithm has been proposed as a core. For clustering, different methods may also be applied. The proposed approach has been validated experimentally using different clustering methods and shows how the clustering technique may influence synthetic instance generation and the performance of WECOI. The WECOI approach has also been compared with other algorithms dedicated to learning from imbalanced data streams. The computational experiment was carried out using several selected benchmark datasets. The computational experiment results are presented and discussed.
引用
收藏
页码:586 / 597
页数:12
相关论文
共 50 条
  • [41] MCBC-SMOTE: A Majority Clustering Model for Classification of Imbalanced Data
    Arora, Jyoti
    Tushir, Meena
    Sharma, Keshav
    Mohan, Lalit
    Singh, Aman
    Alharbi, Abdullah
    Alosaimi, Wael
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (03): : 4801 - 4817
  • [42] Artificial Data Generation with Language Models for Imbalanced Classification in Maintenance
    Usuga-Cadavid, Juan Pablo
    Grabot, Bernard
    Lamouri, Samir
    Fortin, Arnaud
    SERVICE ORIENTED, HOLONIC AND MULTI-AGENT MANUFACTURING SYSTEMS FOR INDUSTRY OF THE FUTURE, SOHOMA LATIN AMERICA 2021, 2021, 987 : 57 - 68
  • [43] CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification
    Koziarski, Michal
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [44] SPECTRAL CLUSTERING WITH IMBALANCED DATA
    Qian, Jing
    Saligrama, Venkatesh
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [45] Credal Clustering for Imbalanced Data
    Zhang, Zuowei
    Liu, Zhunga
    Zhou, Kuang
    Martin, Arnaud
    Zhang, Yiru
    BELIEF FUNCTIONS: THEORY AND APPLICATIONS (BELIEF 2021), 2021, 12915 : 13 - 21
  • [46] KernelADASYN: Kernel Based Adaptive Synthetic Data Generation for Imbalanced Learning
    Tang, Bo
    He, Haibo
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 664 - 671
  • [47] Clustering data streams
    Guha, S
    Mishra, N
    Motwani, R
    O'Callaghan, L
    41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 359 - 366
  • [48] Online Harmonizing Gradient Descent for Imbalanced Data Streams One-Pass Classification
    Zhou, Han
    Yin, Hongpeng
    Deng, Xuanhong
    Huang, Yuyu
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2468 - 2475
  • [49] AN IMBALANCED DATA CLASSIFICATION METHOD BASED ON AUTOMATIC CLUSTERING UNDER-SAMPLING
    Deng, Xiaoheng
    Zhong, Weijian
    Ren, Ju
    Zeng, Detian
    Zhang, Honggang
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [50] CDBH: A clustering and density-based hybrid approach for imbalanced data classification
    Mirzaei, Behzad
    Nikpour, Bahareh
    Nezamabadi-pour, Hossein
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164