Handling Imbalanced Dataset Using SVM and k-NN Approach

被引:9
|
作者
Wah, Yap Bee [1 ]
Abd Rahman, Hezlin Aryani [1 ]
He, Haibo [2 ,3 ]
Bulgiba, Awang [4 ]
机构
[1] Univ Teknol MARA Malaysia, Fac Comp & Math Sci, Shah Alam 40450, Malaysia
[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
[3] Julius Ctr Univ Malaya, Kuala Lumpur, Malaysia
[4] Univ Malaya, Fac Med, Dept Social & Prevent Med, Kuala Lumpur 50603, Malaysia
关键词
data mining; classification; imbalanced data; SVM; k-NN;
D O I
10.1063/1.4954536
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Feature projection k-NN classifier model for imbalanced and incomplete medical data
    Porwik, Piotr
    Orczyk, Tomasz
    Lewandowski, Marcin
    Cholewa, Marcin
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2016, 36 (04) : 644 - 656
  • [22] Monitoring Baby State While Sleeping Using K-NN and M-SVM Classifiers
    Nosseir, Ann
    El Araby, Omar
    PROCEEDINGS OF 2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND INFORMATION ENGINEERING (ICSIE 2019), 2019, : 263 - 267
  • [23] Arabic Islamic Manuscripts Digitization based on Hybrid K-NN/SVM Approach and Cloud Computing Technologies
    Hassen, Hamdi
    Khemakhem, Maher
    2013 TAIBAH UNIVERSITY INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION TECHNOLOGY FOR THE HOLY QURAN AND ITS SCIENCES, 2013, : 366 - 371
  • [24] A Supervised Approach on Gurmukhi Word Sense Disambiguation Using k-NN Method
    Walla, Himdweep
    Rana, Ajay
    Kansal, Vineet
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE CONFLUENCE 2018 ON CLOUD COMPUTING, DATA SCIENCE AND ENGINEERING, 2018, : 743 - 746
  • [25] Comparison of SVM and k-NN classifiers in the estimation of the state of the arteriovenous fistula problem
    Grochowina, Marcin
    Leniowska, Lucyna
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 249 - 254
  • [26] Optimizing HAR Systems: Comparative Analysis of Enhanced SVM and k-NN Classifiers
    Shdefat, Ahmed Younes
    Mostafa, Nour
    Al-Arnaout, Zakwan
    Kotb, Yehia
    Alabed, Samer
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [27] Defining the Features of EMG Signals on the Forearm of the Hand Using SVM, RF, k-NN Classification Algorithms
    Turgunov, Adilbek
    Zohirov, Kudratjon
    Ganiyev, Alisher
    Sharopova, Barno
    2020 INFORMATION COMMUNICATION TECHNOLOGIES CONFERENCE (ICTC), 2020, : 260 - 264
  • [28] Robust Classification of Primary Brain Tumor in Computer Tomography Images Using K-NN and Linear SVM
    Sundararaj, G. Kharmega
    Balamurugan, V.
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 1315 - 1319
  • [29] Early Flood Risk Assessment using Machine Learning: A Comparative study of SVM, Q-SVM, K-NN and LDA
    Khan, Talha Ahmed
    Shahid, Zeeshan
    Alam, Muhammad
    Su'ud, M. M.
    Kadir, Kushsairy
    2019 13TH INTERNATIONAL CONFERENCE ON MATHEMATICS, ACTUARIAL SCIENCE, COMPUTER SCIENCE AND STATISTICS (MACS-13), 2019,
  • [30] On k-NN method with preprocessing
    University of Information Technology and Management, H. Sucharskiego 2, 35-225 Rzeszow, Poland
    不详
    Fundam Inf, 2006, 3 (343-358):