A new sampling method for classifying imbalanced data based on support vector machine ensemble

被引:96
|
作者
Jian, Chuanxia [1 ]
Gao, Jian [1 ]
Ao, Yinhui [1 ]
机构
[1] Guangdong Univ Technol, Sch Electromech Engn, Key Lab Mech Equipment Mfg & Control Technol, Minist Educ, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Sampling; Support vector machine; CLASSIFICATION;
D O I
10.1016/j.neucom.2016.02.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The insufficient information from the minority examples cannot exactly represent the inherent structure of the dataset, which leads to a low prediction accuracy of the minority through the existing classification methods. The over- and under-sampling methods help to increase the prediction accuracy of the minority. However, the two methods either lose important information or add trivial information for classification, so as to affect the prediction accuracy of the minority. Therefore, a new different contribution sampling method (DCS) based on the contributions of the support vectors (SVs) and the nonsupport vectors (NSVs) to classification is proposed in this paper. The proposed DCS method applies different sampling methods for the SVs and the NSVs and uses the biased support vector machine (B-SVM) method to identify the SVs and the NSVs of an imbalanced data. Moreover, the synthetic minority over sampling technique (SMOTE) and the random under-sampling technique (RUS) are used in the proposed method to re-sample the SVs in the minority and the NSVs in the majority, respectively. Examples are labeled by the ensemble of support vector machine (SVMen). Experiments are carried out on the imbalanced dataset which is selected from UCI, AVU06a, Statlog, DP01a, JP98a and CWH03a repositories. Experimental results show that for the imbalanced datasets, the proposed DCS method achieves a better performance in the aspects of Receiver Operating Characteristic (ROC) curve than other methods. The proposed DCS method improves 20.80%, 5.97%, 8.66% and 9.35% in terms of the geometric mean prediction accuracy G(mean) as compared with that achieved by using the NS, the US, the SMOTE and the ROS, respectively. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:115 / 122
页数:8
相关论文
共 50 条
  • [41] Fuzzy support vector machine for imbalanced data with borderline noise
    Liu, Jie
    FUZZY SETS AND SYSTEMS, 2021, 413 : 64 - 73
  • [42] Fuzzy support vector machine for imbalanced data with borderline noise
    Liu, Jie
    Fuzzy Sets and Systems, 2021, 413 : 64 - 73
  • [43] Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
    Sun, Bo
    Chen, Haiyan
    Wang, Jiandong
    Xie, Hua
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (02) : 331 - 350
  • [44] A method of classifying imbalanced credit data based on the AC-CTGAN hybrid sampling algorithm
    Chen, Tinggui
    Gu, Hailian
    Yang, Zhiyu
    Yang, Jianjun
    Wang, Bing
    JOURNAL OF CREDIT RISK, 2024, 20 (03):
  • [45] Classifying imbalanced data using ensemble of reduced kernelized weighted extreme learning machine
    Raghuwanshi, Bhagat Singh
    Shukla, Sanyam
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3071 - 3097
  • [46] A new method to improve the sensitivity of support vector machine based on data optimization
    Zhan, Y
    Zhou, YH
    Lu, ZD
    2003 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS, INTELLIGENT SYSTEMS AND SIGNAL PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2003, : 892 - 896
  • [47] Support vector machine based ensemble classifier
    Hu, ZH
    Cai, YZ
    Li, Y
    Xu, XM
    ACC: PROCEEDINGS OF THE 2005 AMERICAN CONTROL CONFERENCE, VOLS 1-7, 2005, : 745 - 749
  • [48] A New Combination Sampling Method for Imbalanced Data
    Li, Hu
    Zou, Peng
    Wang, Xiang
    Xia, Rongze
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 547 - 554
  • [49] Integration of feature vector selection and support vector machine for classification of imbalanced data
    Liu, Jie
    Zio, Enrico
    APPLIED SOFT COMPUTING, 2019, 75 : 702 - 711
  • [50] A New Ensemble Model based Support Vector Machine for Credit Assessing
    Yao, Jianrong
    Lian, Cheng
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (06): : 159 - 167