A hierarchical VQSVM for imbalanced data sets

被引:4
|
作者
Yu, Ting [1 ]
Jan, Tony [1 ]
Simoff, Simeon [1 ]
Debenham, John [1 ]
机构
[1] Univ Technol Sydney, Fac Informat Technol, Sydney, NSW 2007, Australia
关键词
D O I
10.1109/IJCNN.2007.4371010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
First, a hierarchical modelling method, VQSVM, is introduced, and some remarks are discussed. Secondly the proposed VQSVM is applied to a nonstandard learning environment, imbalanced data sets. In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. The hierarchical VQSVM contains a set of local models i.e. codevectors produced by the Vector Quantization and a global model, i.e. Support Vector Machine, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling rate. Experiments compare VQSVM with random resampling techniques on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQSVM is superior or equivalent to random resampling techniques, especially in case of extremely imbalanced large datasets.
引用
收藏
页码:518 / 523
页数:6
相关论文
共 50 条
  • [31] Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets
    Liu, Wei
    Chawla, Sanjay
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 345 - 356
  • [32] Evolutionary rule-based systems for imbalanced data sets
    Albert Orriols-Puig
    Ester Bernadó-Mansilla
    Soft Computing, 2009, 13
  • [33] The effect of imbalanced data sets on LDA: A theoretical and empirical analysis
    Xie, Jigang
    Qiu, Zhengding
    PATTERN RECOGNITION, 2007, 40 (02) : 557 - 562
  • [34] A memetic approach for training set selection in imbalanced data sets
    Nikpour, Bahareh
    Nezamabadi-pour, Hossein
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (11) : 3043 - 3070
  • [35] Adapted pruning scheme for the framework of imbalanced data-sets
    Chaabane, Ikram
    Guermazi, Radhouane
    Hammami, Mohamed
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 1542 - 1553
  • [36] Feature selection for imbalanced data based on neighborhood rough sets
    Chen, Hongmei
    Li, Tianrui
    Fan, Xin
    Luo, Chuan
    INFORMATION SCIENCES, 2019, 483 : 1 - 20
  • [37] Local cost sensitive learning for handling imbalanced data sets
    Karagiannopoulos, M. G.
    Anyfantis, D. S.
    Kotsiantis, S. B.
    Pintelas, P. E.
    2007 MEDITERRANEAN CONFERENCE ON CONTROL & AUTOMATION, VOLS 1-4, 2007, : 235 - 240
  • [38] A memetic approach for training set selection in imbalanced data sets
    Bahareh Nikpour
    Hossein Nezamabadi-pour
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3043 - 3070
  • [39] A Voronoi Diagram Based Classifier for Multiclass Imbalanced Data Sets
    Silva, Evandro J. R.
    Zanchettin, Cleber
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 109 - 114
  • [40] Contrastive dissimilarity: optimizing performance on imbalanced and limited data sets
    Teixeira, Lucas O.
    Bertolini, Diego
    Oliveira, Luiz S.
    Cavalcanti, George D. C.
    Costa, Yandre M. G.
    Neural Computing and Applications, 2024, 36 (32) : 20439 - 20456