A Constructive Method for Data Reduction and Imbalanced Sampling

被引：0

作者：

Liu, Fei ^{[1
]}

Yan, Yuanting ^{[1
]}

机构：

[1] Anhui Univ, Artificial Intelligence Inst, Sch Comp Sci & Technol, Hefei 230601, Anhui, Peoples R China

来源：

ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT III | 2024年 / 14489卷

基金：

中国国家自然科学基金;

关键词：

constructive covering algorithm; data reduction; undersampling; class imbalance; INSTANCE SELECTION; CLASSIFICATION;

D O I：

10.1007/978-981-97-0798-0_28

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

A large number of training data lead to high computational cost in instanced-based classification. Currently, one of the mainstream methods to reduce data size is to select a representative subset of samples based on spatial partitioning. However, how to select a representative subset while maintaining the overall potential distribution structure of the dataset remains a challenge. Therefore, this paper proposes a constructive data reduction method called Constructive Covering Sampling (CCS) for classification problems. The CCS does not rely on any relevant parameters. It iteratively partitions the original data space into a group of data subspaces, which contains several samples of the same class, and then it selects representative samples from the data subspaces. This not only maintains the original data distribution structure and reduces data size but also reduces problem complexity and improves the learning efficiency of the classifier. Furthermore, CCS can also be extended as an effective undersampling method (CCUS) to address class imbalance issues. Experiments on 18 KEEL and UCI datasets demonstrate that the proposed method outperforms other sampling methods in terms of F-measure, G-mean, AUC and Accuracy.

引用

页码：476 / 489

页数：14

共 50 条

[11] HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
Chen, Liping
Jiang, Jiabao
Zhang, Yong
COMPLEXITY, 2021, 2021
[12] Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
Lv, Zhenzhe
Liu, Qicheng
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (09) : 1528 - 1536
[13] An Effective Over-sampling Method for Imbalanced Data Sets Classification
Zhai Yun
Ma Nan
Ruan Da
An Bing
CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (03): : 489 - 494
[14] HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification
Hasib, Khan Md
Towhid, Nurul Akter
Islam, Md Rafiqul
INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2021, 11 (04) : 1 - 13
[15] Under-sampling method based on sample weight for imbalanced data
Xiong B.
Wang G.
Deng W.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2016, 53 (11): : 2613 - 2622
[16] Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE
Chen, Junfeng
Zheng, Zhongtuan
Computer Engineering and Applications, 2024, 57 (23) : 106 - 112
[17] A Hybrid Under-Sampling Method (HUSBoost) to Classify Imbalanced Data
Popel, Mahmudul Hasan
Hasib, Khan Md
Habib, Syed Ahsan
Shah, Faisal Muhammad
2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
[18] Neighbourhood sampling in bagging for imbalanced data
Blaszczynski, Jerzy
Stefanowski, Jerzy
NEUROCOMPUTING, 2015, 150 : 529 - 542
[19] Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets
Rivera, William A.
INFORMATION SCIENCES, 2017, 408 : 146 - 161
[20] AN IMBALANCED DATA CLASSIFICATION METHOD BASED ON AUTOMATIC CLUSTERING UNDER-SAMPLING
Deng, Xiaoheng
Zhong, Weijian
Ren, Ju
Zeng, Detian
Zhang, Honggang
2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,

← 1 2 3 4 5 →