Handling Imbalanced Dataset Using SVM and k-NN Approach

被引：9

作者：

Wah, Yap Bee ^{[1
]}

Abd Rahman, Hezlin Aryani ^{[1
]}

He, Haibo ^{[2
,3
]}

Bulgiba, Awang ^{[4
]}

机构：

[1] Univ Teknol MARA Malaysia, Fac Comp & Math Sci, Shah Alam 40450, Malaysia

[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA

[3] Julius Ctr Univ Malaya, Kuala Lumpur, Malaysia

[4] Univ Malaya, Fac Med, Dept Social & Prevent Med, Kuala Lumpur 50603, Malaysia

来源：

ADVANCES IN INDUSTRIAL AND APPLIED MATHEMATICS | 2016年 / 1750卷

关键词：

data mining; classification; imbalanced data; SVM; k-NN;

D O I：

10.1063/1.4954536

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set.

引用

页数：8

共 50 条

[31] On the Merge of k-NN Graph
Zhao, Wan-Lei
Wang, Hui
Lin, Peng-Cheng
Ngo, Chong-Wah
IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (06) : 1496 - 1510
[32] Ship Movement Prediction Using k-NN Method
Virjonen, Petra
Nevalainen, Paavo
Pahikkala, Tapio
Heikkonen, Jukka
2018 BALTIC GEODETIC CONGRESS (BGC-GEOMATICS 2018), 2018, : 304 - 309
[33] Using Genetic Algorithm to Improve Fuzzy k-NN
Zhang Juan
Niu Yi
He Wenbin
2008 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, VOLS 1 AND 2, PROCEEDINGS, 2008, : 475 - +
[34] Improving k-NN by using fuzzy similarity functions
Morell, C
Bello, R
Grau, R
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2004, 2004, 3315 : 708 - 716
[35] Trajectory Clustering and k-NN for Robust Privacy Preserving k-NN Query Processing in GeoSpark
Dritsas, Elias
Kanavos, Andreas
Trigka, Maria
Vonitsanos, Gerasimos
Sioutas, Spyros
Tsakalidis, Athanasios
ALGORITHMS, 2020, 13 (08)
[36] A Novel Approach for Handling Imbalanced Data in Breast Cancer Dataset
Banothu, Nagateja
Prabu, M.
PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2022, 2023, 475 : 709 - 723
[37] A novel adaptive k-NN classifier for handling imbalance: Application to brain MRI
Kirtania, Ritaban
Mitra, Sushmita
Shankar, B. Uma
INTELLIGENT DATA ANALYSIS, 2020, 24 (04) : 909 - 924
[38] Moderating k-NN classifiers
Alkoot, FM
Kittler, J
PATTERN ANALYSIS AND APPLICATIONS, 2002, 5 (03) : 326 - 332
[39] On k-NN method with preprocessing
Suraj, Z
Delinnata, P
FUNDAMENTA INFORMATICAE, 2006, 69 (03) : 343 - 358
[40] GENERALIZATION OF K-NN RULE
TOMEK, I
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1976, 6 (02): : 121 - 126

← 1 2 3 4 5 →