A Constructive Method for Data Reduction and Imbalanced Sampling

被引:0
|
作者
Liu, Fei [1 ]
Yan, Yuanting [1 ]
机构
[1] Anhui Univ, Artificial Intelligence Inst, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
constructive covering algorithm; data reduction; undersampling; class imbalance; INSTANCE SELECTION; CLASSIFICATION;
D O I
10.1007/978-981-97-0798-0_28
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large number of training data lead to high computational cost in instanced-based classification. Currently, one of the mainstream methods to reduce data size is to select a representative subset of samples based on spatial partitioning. However, how to select a representative subset while maintaining the overall potential distribution structure of the dataset remains a challenge. Therefore, this paper proposes a constructive data reduction method called Constructive Covering Sampling (CCS) for classification problems. The CCS does not rely on any relevant parameters. It iteratively partitions the original data space into a group of data subspaces, which contains several samples of the same class, and then it selects representative samples from the data subspaces. This not only maintains the original data distribution structure and reduces data size but also reduces problem complexity and improves the learning efficiency of the classifier. Furthermore, CCS can also be extended as an effective undersampling method (CCUS) to address class imbalance issues. Experiments on 18 KEEL and UCI datasets demonstrate that the proposed method outperforms other sampling methods in terms of F-measure, G-mean, AUC and Accuracy.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [31] An Evolutionary Sampling Approach for Classification with Imbalanced Data
    Fernandes, Everlandio R. Q.
    de Carvalho, Andre C. P. L. F.
    Coelho, Andre L. V.
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [32] Using evolutionary sampling to mine imbalanced data
    Drown, Dennis J.
    Khoshgoftaar, Taghi M.
    Narayanan, Rarnaswarny
    ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 363 - 368
  • [33] A Biased Sampling Method for Imbalanced Personalized Ranking
    Yu, Lu
    Pei, Shichao
    Zhu, Feng
    Li, Longfei
    Zhou, Jun
    Zhang, Chuxu
    Zhang, Xiangliang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 2393 - 2402
  • [34] Neighborhood attribute reduction for imbalanced data
    Zhang, Wendong
    Wang, Xun
    Yang, Xibei
    Chen, Xiangjian
    Wang, Pingxin
    GRANULAR COMPUTING, 2019, 4 (03) : 301 - 311
  • [35] An Imbalanced Classification Method Based on Adaptive Sampling
    Chen Q.
    Xie J.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (04): : 26 - 34and45
  • [36] Gradient Guided Sampling Method for Imbalanced Learning
    Quan, Li
    Zhang, Wei
    Zhang, Xueyuan
    Xie, Qingdi
    2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 424 - 427
  • [37] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Bo Sun
    Haiyan Chen
    Jiandong Wang
    Hua Xie
    Frontiers of Computer Science, 2018, 12 : 331 - 350
  • [38] A new sampling method for classifying imbalanced data based on support vector machine ensemble
    Jian, Chuanxia
    Gao, Jian
    Ao, Yinhui
    NEUROCOMPUTING, 2016, 193 : 115 - 122
  • [39] A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets
    Li, Peng
    Qiao, Pei-Li
    Liu, Yuan-Chao
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 65 - 69
  • [40] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Sun, Bo
    Chen, Haiyan
    Wang, Jiandong
    Xie, Hua
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 331 - 350