A new sampling method for classifying imbalanced data based on support vector machine ensemble

被引:96
|
作者
Jian, Chuanxia [1 ]
Gao, Jian [1 ]
Ao, Yinhui [1 ]
机构
[1] Guangdong Univ Technol, Sch Electromech Engn, Key Lab Mech Equipment Mfg & Control Technol, Minist Educ, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Sampling; Support vector machine; CLASSIFICATION;
D O I
10.1016/j.neucom.2016.02.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The insufficient information from the minority examples cannot exactly represent the inherent structure of the dataset, which leads to a low prediction accuracy of the minority through the existing classification methods. The over- and under-sampling methods help to increase the prediction accuracy of the minority. However, the two methods either lose important information or add trivial information for classification, so as to affect the prediction accuracy of the minority. Therefore, a new different contribution sampling method (DCS) based on the contributions of the support vectors (SVs) and the nonsupport vectors (NSVs) to classification is proposed in this paper. The proposed DCS method applies different sampling methods for the SVs and the NSVs and uses the biased support vector machine (B-SVM) method to identify the SVs and the NSVs of an imbalanced data. Moreover, the synthetic minority over sampling technique (SMOTE) and the random under-sampling technique (RUS) are used in the proposed method to re-sample the SVs in the minority and the NSVs in the majority, respectively. Examples are labeled by the ensemble of support vector machine (SVMen). Experiments are carried out on the imbalanced dataset which is selected from UCI, AVU06a, Statlog, DP01a, JP98a and CWH03a repositories. Experimental results show that for the imbalanced datasets, the proposed DCS method achieves a better performance in the aspects of Receiver Operating Characteristic (ROC) curve than other methods. The proposed DCS method improves 20.80%, 5.97%, 8.66% and 9.35% in terms of the geometric mean prediction accuracy G(mean) as compared with that achieved by using the NS, the US, the SMOTE and the ROS, respectively. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:115 / 122
页数:8
相关论文
共 50 条
  • [31] Instance-based entropy fuzzy support vector machine for imbalanced data
    Poongjin Cho
    Minhyuk Lee
    Woojin Chang
    Pattern Analysis and Applications, 2020, 23 : 1183 - 1202
  • [32] Instance-based entropy fuzzy support vector machine for imbalanced data
    Cho, Poongjin
    Lee, Minhyuk
    Chang, Woojin
    PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (03) : 1183 - 1202
  • [33] Imbalanced data classification based on scaling kernel-based support vector machine
    Yong Zhang
    Panpan Fu
    Wenzhe Liu
    Guolong Chen
    Neural Computing and Applications, 2014, 25 : 927 - 935
  • [34] Imbalanced data classification based on scaling kernel-based support vector machine
    Zhang, Yong
    Fu, Panpan
    Liu, Wenzhe
    Chen, Guolong
    NEURAL COMPUTING & APPLICATIONS, 2014, 25 (3-4): : 927 - 935
  • [35] Clustering and Weighted Scoring in Geometric Space Support Vector Machine Ensemble for Highly Imbalanced Data Classification
    Ksieniewicz, Pawel
    Burduk, Robert
    COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 128 - 140
  • [36] BALANCED VS IMBALANCED TRAINING DATA: CLASSIFYING RAPIDEYE DATA WITH SUPPORT VECTOR MACHINES
    Ustuner, M.
    Sanli, F. B.
    Abdikan, S.
    XXIII ISPRS CONGRESS, COMMISSION VII, 2016, 41 (B7): : 379 - 384
  • [37] Research on classifying technique for imbalanced dataset based on Support Vector Machines
    Yang Zhi-ming
    Peng Yu
    Peng Xi-yuan
    FIFTH INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, 2009, 7133
  • [38] Classifying imbalanced data using ensemble of reduced kernelized weighted extreme learning machine
    Bhagat Singh Raghuwanshi
    Sanyam Shukla
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3071 - 3097
  • [39] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Bo Sun
    Haiyan Chen
    Jiandong Wang
    Hua Xie
    Frontiers of Computer Science, 2018, 12 : 331 - 350
  • [40] Fuzzy Support Vector Machine for Microarray Imbalanced Data Classification
    Ladayya, Faroh
    Purnami, Santi Wulan
    Irhamah
    13TH IMT-GT INTERNATIONAL CONFERENCE ON MATHEMATICS, STATISTICS AND THEIR APPLICATIONS (ICMSA2017), 2017, 1905