A Constructive Method for Data Reduction and Imbalanced Sampling

被引：0

作者：

Liu, Fei ^{[1
]}

Yan, Yuanting ^{[1
]}

机构：

[1] Anhui Univ, Artificial Intelligence Inst, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China

来源：

ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT III | 2024年 / 14489卷

基金：

中国国家自然科学基金;

关键词：

constructive covering algorithm; data reduction; undersampling; class imbalance; INSTANCE SELECTION; CLASSIFICATION;

D O I：

10.1007/978-981-97-0798-0_28

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

A large number of training data lead to high computational cost in instanced-based classification. Currently, one of the mainstream methods to reduce data size is to select a representative subset of samples based on spatial partitioning. However, how to select a representative subset while maintaining the overall potential distribution structure of the dataset remains a challenge. Therefore, this paper proposes a constructive data reduction method called Constructive Covering Sampling (CCS) for classification problems. The CCS does not rely on any relevant parameters. It iteratively partitions the original data space into a group of data subspaces, which contains several samples of the same class, and then it selects representative samples from the data subspaces. This not only maintains the original data distribution structure and reduces data size but also reduces problem complexity and improves the learning efficiency of the classifier. Furthermore, CCS can also be extended as an effective undersampling method (CCUS) to address class imbalance issues. Experiments on 18 KEEL and UCI datasets demonstrate that the proposed method outperforms other sampling methods in terms of F-measure, G-mean, AUC and Accuracy.

引用

页码：476 / 489

页数：14

共 50 条

[41] GDHS: An efficient hybrid sampling method for multi-class imbalanced data classification
Yan, Yuanting
Lv, Yan
Han, Shuangyue
Yu, Chengjin
Zhou, Peng
Neurocomputing, 2025, 637
[42] Neighborhood attribute reduction for imbalanced data
Wendong Zhang
Xun Wang
Xibei Yang
Xiangjian Chen
Pingxin Wang
Granular Computing, 2019, 4 : 301 - 311
[43] An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification
Zheng, Ming
Li, Tong
Sun, Liping
Wang, Taochun
Jie, Biao
Yang, Weiyi
Tang, Mingjing
Lv, Changlong
KNOWLEDGE-BASED SYSTEMS, 2021, 216 (216)
[44] Telecom Customer Chum Prediction Based on Imbalanced Data Re-sampling Method
Li Peng
Yu Xiaoyang
Sun Boyu
Huang Jiuling
PROCEEDINGS OF 2013 2ND INTERNATIONAL CONFERENCE ON MEASUREMENT, INFORMATION AND CONTROL (ICMIC 2013), VOLS 1 & 2, 2013, : 229 - 233
[45] A Hybrid Sampling SVM Approach to Imbalanced Data Classification
Wang, Qiang
ABSTRACT AND APPLIED ANALYSIS, 2014,
[46] Exploring Data Sampling Techniques for Imbalanced Classification Problems
Sui, Yu
Zhang, Xiaohui
Huan, Jiajia
Hong, Haifeng
FOURTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2019, 11198
[47] CLUS: A New Hybrid Sampling Classification for Imbalanced Data
Prachuabsupakij, Wanthanee
PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 281 - 286
[48] Cluster-based sampling of multiclass imbalanced data
Prachuabsupakij, Wanthanee
Soonthornphisaj, Nuanwan
INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
[49] Over-sampling algorithm for imbalanced data classification
Xu Xiaolong
Chen Wen
Sun Yanfei
JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2019, 30 (06) : 1182 - 1191
[50] Over-sampling algorithm for imbalanced data classification
XU Xiaolong
CHEN Wen
SUN Yanfei
JournalofSystemsEngineeringandElectronics, 2019, 30 (06) : 1182 - 1191

← 1 2 3 4 5 →