Handling Imbalanced Dataset Using SVM and k-NN Approach

被引:9
|
作者
Wah, Yap Bee [1 ]
Abd Rahman, Hezlin Aryani [1 ]
He, Haibo [2 ,3 ]
Bulgiba, Awang [4 ]
机构
[1] Univ Teknol MARA Malaysia, Fac Comp & Math Sci, Shah Alam 40450, Malaysia
[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
[3] Julius Ctr Univ Malaya, Kuala Lumpur, Malaysia
[4] Univ Malaya, Fac Med, Dept Social & Prevent Med, Kuala Lumpur 50603, Malaysia
关键词
data mining; classification; imbalanced data; SVM; k-NN;
D O I
10.1063/1.4954536
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Moderating k-NN Classifiers
    Fuad M. Alkoot
    Josef Kittler
    Pattern Analysis & Applications, 2002, 5 : 326 - 332
  • [42] Evaluation Of Human Age With FKP Using K-NN
    KaviPriya, A.
    Muthukumar, A.
    IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORK SECURITY (ICSNS 2018), 2018, : 193 - 196
  • [43] Using Dominant Sets for k-NN Prototype Selection
    Vascon, Sebastiano
    Cristani, Marco
    Pelillo, Marcello
    Murino, Vittorio
    IMAGE ANALYSIS AND PROCESSING (ICIAP 2013), PT II, 2013, 8157 : 131 - 140
  • [44] Online Document Filtering Using Adaptive k-NN
    Bodinier, Vincent
    Qamar, Ali Mustafa
    Gaussier, Eric
    EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 947 - 950
  • [45] Histogram of oriented gradients based off-line handwritten devanagari characters recognition using SVM, K-NN and NN classifiers
    Deore S.P.
    Pravin A.
    Revue d'Intelligence Artificielle, 2019, 33 (06) : 441 - 446
  • [46] Automatic fibrosis quantification by using a k-NN classificator
    Romero, E
    Raymackers, JM
    Macq, B
    Cuisenaire, O
    PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-4: BUILDING NEW BRIDGES AT THE FRONTIERS OF ENGINEERING AND MEDICINE, 2001, 23 : 2609 - 2612
  • [47] Approximate k-NN Graph Construction: A Generic Online Approach
    Zhao, Wan-Lei
    Wang, Hui
    Ngo, Chong-Wah
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1909 - 1921
  • [48] A Memory Based Approach to Word Sense Disambiguation in Bengali Using k-NN Method
    Pandit, Rajat
    Naskar, Sudip Kumar
    2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 383 - 386
  • [49] Intelligent System to Classify Peanuts Varieties Using K-Nearest Neighbors (K-NN) and Support Vector Machine (SVM)
    Narendra, V. G.
    Hegde, K. Govardhan
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, PT I, 2019, 1075 : 359 - 368
  • [50] HUMAN ACTION ANALYSIS USING K-NN CLASSIFIER
    Akilandasowmya, G.
    Sathiya, P.
    AnandhaKumar, P.
    2015 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2015,