A hierarchical VQSVM for imbalanced data sets

被引:4
|
作者
Yu, Ting [1 ]
Jan, Tony [1 ]
Simoff, Simeon [1 ]
Debenham, John [1 ]
机构
[1] Univ Technol Sydney, Fac Informat Technol, Sydney, NSW 2007, Australia
关键词
D O I
10.1109/IJCNN.2007.4371010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
First, a hierarchical modelling method, VQSVM, is introduced, and some remarks are discussed. Secondly the proposed VQSVM is applied to a nonstandard learning environment, imbalanced data sets. In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. The hierarchical VQSVM contains a set of local models i.e. codevectors produced by the Vector Quantization and a global model, i.e. Support Vector Machine, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling rate. Experiments compare VQSVM with random resampling techniques on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQSVM is superior or equivalent to random resampling techniques, especially in case of extremely imbalanced large datasets.
引用
收藏
页码:518 / 523
页数:6
相关论文
共 50 条
  • [41] Editing Training Sets from Imbalanced Data Using Fuzzy-Rough Sets
    Nguyen, Do Van
    Ogawa, Keisuke
    Matsumoto, Kazunori
    Hashimoto, Masayuki
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, 2015, 458 : 115 - 129
  • [42] An empirical study of the behavior of classifiers on imbalanced and overlapped data sets
    Garcia, Vicente
    Sanchez, Jose
    Mollineda, Ramon
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2007, 4756 : 397 - +
  • [43] Evolutionary rule-based systems for imbalanced data sets
    Orriols-Puig, Albert
    Bernado-Mansilla, Ester
    SOFT COMPUTING, 2009, 13 (03) : 213 - 225
  • [44] Dealing with difficult minority labels in imbalanced mutilabel data sets
    Charte, Francisco
    Rivera, Antonio J.
    del Jesus, Maria J.
    Herrera, Francisco
    NEUROCOMPUTING, 2019, 326 : 39 - 53
  • [45] Diversity Analysis on Imbalanced Data Sets by Using Ensemble Models
    Wang, Shuo
    Yao, Xin
    2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, : 324 - 331
  • [46] Improving SVM Classification on Imbalanced Data Sets in Distance Spaces
    Koeknar-Tezel, Suzan
    Latecki, Longin Jan
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 259 - +
  • [47] A multiple resampling method for learning from imbalanced data sets
    Estabrooks, A
    Jo, TH
    Japkowicz, N
    COMPUTATIONAL INTELLIGENCE, 2004, 20 (01) : 18 - 36
  • [48] A wrapper for reweighting training instances for handling imbalanced data sets
    Karagiannopoulos, M.
    Anyfantis, D.
    Kotsiantis, S.
    Pintelas, P.
    ARTIFICIAL INTELLIGENCE AND INNOVATIONS 2007: FROM THEORY TO APPLICATIONS, 2007, : 29 - +
  • [49] Random Forests lithology prediction method for imbalanced data sets
    Wang G.
    Song J.
    Xu F.
    Zhang W.
    Liu J.
    Chen F.
    Shiyou Diqiu Wuli Kantan/Oil Geophysical Prospecting, 2021, 56 (04): : 679 - 687
  • [50] Difficulty Factors and Preprocessing in Imbalanced Data Sets: An Experimental Study on Artificial Data
    Wojciechowski S.
    Wilk S.
    1600, Walter de Gruyter GmbH (42): : 149 - 176