A new sampling method for classifying imbalanced data based on support vector machine ensemble

被引:96
|
作者
Jian, Chuanxia [1 ]
Gao, Jian [1 ]
Ao, Yinhui [1 ]
机构
[1] Guangdong Univ Technol, Sch Electromech Engn, Key Lab Mech Equipment Mfg & Control Technol, Minist Educ, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Sampling; Support vector machine; CLASSIFICATION;
D O I
10.1016/j.neucom.2016.02.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The insufficient information from the minority examples cannot exactly represent the inherent structure of the dataset, which leads to a low prediction accuracy of the minority through the existing classification methods. The over- and under-sampling methods help to increase the prediction accuracy of the minority. However, the two methods either lose important information or add trivial information for classification, so as to affect the prediction accuracy of the minority. Therefore, a new different contribution sampling method (DCS) based on the contributions of the support vectors (SVs) and the nonsupport vectors (NSVs) to classification is proposed in this paper. The proposed DCS method applies different sampling methods for the SVs and the NSVs and uses the biased support vector machine (B-SVM) method to identify the SVs and the NSVs of an imbalanced data. Moreover, the synthetic minority over sampling technique (SMOTE) and the random under-sampling technique (RUS) are used in the proposed method to re-sample the SVs in the minority and the NSVs in the majority, respectively. Examples are labeled by the ensemble of support vector machine (SVMen). Experiments are carried out on the imbalanced dataset which is selected from UCI, AVU06a, Statlog, DP01a, JP98a and CWH03a repositories. Experimental results show that for the imbalanced datasets, the proposed DCS method achieves a better performance in the aspects of Receiver Operating Characteristic (ROC) curve than other methods. The proposed DCS method improves 20.80%, 5.97%, 8.66% and 9.35% in terms of the geometric mean prediction accuracy G(mean) as compared with that achieved by using the NS, the US, the SMOTE and the ROS, respectively. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:115 / 122
页数:8
相关论文
共 50 条
  • [1] Combine Sampling Support Vector Machine for Imbalanced Data Classification
    Sain, Hartayuni
    Purnami, Santi Wulan
    THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 59 - 66
  • [2] EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
    Jung, Ilok
    Ji, Jaewon
    Cho, Changseob
    ELECTRONICS, 2022, 11 (09)
  • [3] A novel ensemble method for classifying imbalanced data
    Sun, Zhongbin
    Song, Qinbao
    Zhu, Xiaoyan
    Sun, Heli
    Xu, Baowen
    Zhou, Yuming
    PATTERN RECOGNITION, 2015, 48 (05) : 1623 - 1637
  • [4] Fuzzy Support Vector Machine With Relative Density Information for Classifying Imbalanced Data
    Yu, Hualong
    Sun, Changyin
    Yang, Xibei
    Zheng, Shang
    Zou, Haitao
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2019, 27 (12) : 2353 - 2367
  • [5] A New Method for Classifying Random Variables Based on Support Vector Machine
    Maryam Abaszade
    Sohrab Effati
    Journal of Classification, 2019, 36 : 152 - 174
  • [6] A New Method for Classifying Random Variables Based on Support Vector Machine
    Abaszade, Maryam
    Effati, Sohrab
    JOURNAL OF CLASSIFICATION, 2019, 36 (01) : 152 - 174
  • [7] Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data
    Yu, Hualong
    Mu, Chaoxu
    Sun, Changyin
    Yang, Wankou
    Yang, Xibei
    Zuo, Xin
    KNOWLEDGE-BASED SYSTEMS, 2015, 76 : 67 - 78
  • [8] Adaptive Ensemble Method Based on Spatial Characteristics for Classifying Imbalanced Data
    Wang, Lei
    Zhao, Lei
    Gui, Guan
    Zheng, Baoyu
    Huang, Ruochen
    SCIENTIFIC PROGRAMMING, 2017, 2017
  • [9] Imbalanced classification using support vector machine ensemble
    Jiang Tian
    Hong Gu
    Wenqi Liu
    Neural Computing and Applications, 2011, 20 : 203 - 209
  • [10] Imbalanced classification using support vector machine ensemble
    Tian, Jiang
    Gu, Hong
    Liu, Wenqi
    NEURAL COMPUTING & APPLICATIONS, 2011, 20 (02): : 203 - 209