A Constructive Method for Data Reduction and Imbalanced Sampling

被引:0
|
作者
Liu, Fei [1 ]
Yan, Yuanting [1 ]
机构
[1] Anhui Univ, Artificial Intelligence Inst, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
constructive covering algorithm; data reduction; undersampling; class imbalance; INSTANCE SELECTION; CLASSIFICATION;
D O I
10.1007/978-981-97-0798-0_28
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large number of training data lead to high computational cost in instanced-based classification. Currently, one of the mainstream methods to reduce data size is to select a representative subset of samples based on spatial partitioning. However, how to select a representative subset while maintaining the overall potential distribution structure of the dataset remains a challenge. Therefore, this paper proposes a constructive data reduction method called Constructive Covering Sampling (CCS) for classification problems. The CCS does not rely on any relevant parameters. It iteratively partitions the original data space into a group of data subspaces, which contains several samples of the same class, and then it selects representative samples from the data subspaces. This not only maintains the original data distribution structure and reduces data size but also reduces problem complexity and improves the learning efficiency of the classifier. Furthermore, CCS can also be extended as an effective undersampling method (CCUS) to address class imbalance issues. Experiments on 18 KEEL and UCI datasets demonstrate that the proposed method outperforms other sampling methods in terms of F-measure, G-mean, AUC and Accuracy.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [21] EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
    Jung, Ilok
    Ji, Jaewon
    Cho, Changseob
    ELECTRONICS, 2022, 11 (09)
  • [22] A Progressive Sampling Method for Dual -Node Imbalanced Learning with Restricted Data Access
    Qiu, Yixuan
    Chen, Weitong
    Xu, Miao
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 508 - 517
  • [23] HYBS: A novel hybrid sampling method for learning from imbalanced data sets
    Liu, Zhiyong
    Yu, Hualong
    International Journal of Advancements in Computing Technology, 2012, 4 (10) : 281 - 288
  • [24] Imbalanced Data Set CSVM Classification Method Based on Cluster Boundary Sampling
    Li, Peng
    Liang, Tian-ge
    Zhang, Kai-hui
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [25] Clustering boundary over-sampling classification method for imbalanced data sets
    Lou, Xiao-Jun
    Sun, Yu-Xuan
    Liu, Hai-Tao
    Liu, H.-T. (liuhaitao@wsn.cn), 1600, Zhejiang University (47): : 944 - 950
  • [26] Deep Learning and Data Sampling with Imbalanced Big Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 175 - 183
  • [27] A hybrid sampling method for highly imbalanced and overlapped data classification with complex distribution
    Liu, Yansong
    Zhu, Li
    Ding, Lei
    Sui, He
    Shang, Wenli
    INFORMATION SCIENCES, 2024, 661
  • [28] Constructive sample partition-based parameter-free sampling for class-overlapped imbalanced data classification
    Wang, Weiqing
    Yan, Yuanting
    Zhou, Peng
    Zhao, Shu
    Zhang, Yiwen
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [29] Data reduction and stacking for imbalanced data classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (06) : 7239 - 7249
  • [30] An evaluation of progressive sampling for imbalanced data sets
    Ng, Willie
    Dash, Manoranjan
    ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 657 - +