Difficulty Factors and Preprocessing in Imbalanced Data Sets: An Experimental Study on Artificial Data

被引：17

作者：

Wojciechowski S. ^{[1
]}

Wilk S. ^{[1
]}

机构：

[1] Institute of Computing Science, Poznan University of Technology, Piotrowo 2, Poznan

来源：

| 1600年 / Walter de Gruyter GmbH卷 / 42期

关键词：

difficulty factors; imbalanced data; learning and classification; preprocessing methods;

D O I：

10.1515/fcds-2017-0007

中图分类号：

学科分类号：

摘要：

In this paper we describe results of an experimental study where we checked the impact of various difficulty factors in imbalanced data sets on the performance of selected classifiers applied alone or combined with several preprocessing methods. In the study we used artificial data sets in order to systematically check factors such as dimensionality, class imbalance ratio or distribution of specific types of examples (safe, borderline, rare and outliers) in the minority class. The results revealed that the latter factor was the most critical one and it exacerbated other factors (in particular class imbalance). The best classification performance was demonstrated by non-symbolic classifiers, particular by k-NN classifiers (with 1 or 3 neighbors - 1NN and 3NN, respectively) and by SVM. Moreover, they benefited from different preprocessing methods - SVM and 1NN worked best with undersampling, while oversampling was more beneficial for 3NN. © by Szymon Wilk 2017.

引用

页码：149 / 176

页数：27

共 50 条

[31] Research on imbalanced data set preprocessing based on deep learning
Wang Fangyu
Zhang Jianhui
Bu Youjun
Chen Bo
2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021), 2021, : 75 - 79
[32] Balanced Neighborhood Classifiers for Imbalanced Data Sets
Zhu, Shunzhi
Ma, Ying
Pan, Weiwei
Zhu, Xiatian
Luo, Guangchun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (12): : 3226 - 3229
[33] Classification with local clustering in imbalanced data sets
Ji, Hua
Zhang, Huaxiang
ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 151 - 155
[34] An evaluation of progressive sampling for imbalanced data sets
Ng, Willie
Dash, Manoranjan
ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 657 - +
[35] Analysis of Data Preprocessing Increasing the Oversampling Ratio for Extremely Imbalanced Big Data Classification
del Rio, Sara
Benitez, Jose M.
Herrera, Francisco
2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 180 - 185
[36] A Supervised Learning Approach for Imbalanced Data Sets
Nguyen, Giang H.
Bouzerdoum, Abdesselam
Phung, Son L.
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3759 - 3762
[37] Evaluation of the Classifiers in Multiparameter and Imbalanced Data Sets
Piotrowska, Ewelina
INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2019, PT II, 2020, 1051 : 263 - 273
[38] On Validation Setup for Multiclass Imbalanced Data Sets
Silva, Evandro J. R.
Zanchettin, Cleber
PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 468 - 473
[39] Dynamic Feature Weighting for Imbalanced Data Sets
Dialameh, Maryam
Jahromi, Mansoor Zolghadri
2015 SIGNAL PROCESSING AND INTELLIGENT SYSTEMS CONFERENCE (SPIS), 2015, : 31 - 36
[40] An Empirical Study on Preprocessing High-dimensional Class-imbalanced Data for Classification
Yin, Hua
Gai, Keke
2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1314 - 1319

← 1 2 3 4 5 →