A Constructive Method for Data Reduction and Imbalanced Sampling

被引:0
|
作者
Liu, Fei [1 ]
Yan, Yuanting [1 ]
机构
[1] Anhui Univ, Artificial Intelligence Inst, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
constructive covering algorithm; data reduction; undersampling; class imbalance; INSTANCE SELECTION; CLASSIFICATION;
D O I
10.1007/978-981-97-0798-0_28
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large number of training data lead to high computational cost in instanced-based classification. Currently, one of the mainstream methods to reduce data size is to select a representative subset of samples based on spatial partitioning. However, how to select a representative subset while maintaining the overall potential distribution structure of the dataset remains a challenge. Therefore, this paper proposes a constructive data reduction method called Constructive Covering Sampling (CCS) for classification problems. The CCS does not rely on any relevant parameters. It iteratively partitions the original data space into a group of data subspaces, which contains several samples of the same class, and then it selects representative samples from the data subspaces. This not only maintains the original data distribution structure and reduces data size but also reduces problem complexity and improves the learning efficiency of the classifier. Furthermore, CCS can also be extended as an effective undersampling method (CCUS) to address class imbalance issues. Experiments on 18 KEEL and UCI datasets demonstrate that the proposed method outperforms other sampling methods in terms of F-measure, G-mean, AUC and Accuracy.
引用
收藏
页码:476 / 489
页数:14
相关论文
共 50 条
  • [1] A Hybrid Sampling Method for Imbalanced Data
    Gazzah, Sami
    Hechkel, Amina
    Ben Amara, Najoua Essoukri
    2015 IEEE 12TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2015,
  • [2] A New Combination Sampling Method for Imbalanced Data
    Li, Hu
    Zou, Peng
    Wang, Xiang
    Xia, Rongze
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 547 - 554
  • [3] A Cluster Switching Method for Sampling Imbalanced Data
    Prachuabsupakij, Wanthanee
    Simcharoen, Supaporn
    ISMSI 2018: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE, 2018, : 12 - 16
  • [4] A Sampling Method of Imbalanced Data Based on Sample Space
    Zhang Y.-Q.
    Lu R.-Z.
    Qiao S.-J.
    Han N.
    Gutierrez L.A.
    Zhou J.-L.
    Zidonghua Xuebao/Acta Automatica Sinica, 2022, 48 (10): : 2549 - 2563
  • [5] A Mixed Sampling Method for Imbalanced Data Based on Neighborhood Density
    Hu, Feng
    Yu, Chunlin
    Dai, Jin
    Liu, Ke
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2019, : 94 - 98
  • [6] Hybrid Sampling Method for Overlap Region of ICS Imbalanced Data
    Gao, Bing
    Gu, Zhaojun
    Zhou, Jingxian
    Sui, He
    Computer Engineering and Applications, 2023, 59 (19) : 305 - 315
  • [7] Safe sample screening based sampling method for imbalanced data
    Shi H.
    Liu Y.
    Ji S.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (06): : 545 - 556
  • [8] Parallel selective sampling method for imbalanced and large data classification
    D'Addabbo, Annarita
    Maglietta, Rosalia
    PATTERN RECOGNITION LETTERS, 2015, 62 : 61 - 67
  • [9] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [10] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2009, 16 (03) : 193 - 210