A hierarchical VQSVM for imbalanced data sets

被引:4
|
作者
Yu, Ting [1 ]
Jan, Tony [1 ]
Simoff, Simeon [1 ]
Debenham, John [1 ]
机构
[1] Univ Technol Sydney, Fac Informat Technol, Sydney, NSW 2007, Australia
关键词
D O I
10.1109/IJCNN.2007.4371010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
First, a hierarchical modelling method, VQSVM, is introduced, and some remarks are discussed. Secondly the proposed VQSVM is applied to a nonstandard learning environment, imbalanced data sets. In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. The hierarchical VQSVM contains a set of local models i.e. codevectors produced by the Vector Quantization and a global model, i.e. Support Vector Machine, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling rate. Experiments compare VQSVM with random resampling techniques on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQSVM is superior or equivalent to random resampling techniques, especially in case of extremely imbalanced large datasets.
引用
收藏
页码:518 / 523
页数:6
相关论文
共 50 条
  • [21] Online Nonlinear AUC Maximization for Imbalanced Data Sets
    Hu, Junjie
    Yang, Haiqin
    Lyu, Michael R.
    King, Irwin
    So, Anthony Man-Cho
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (04) : 882 - 895
  • [22] Hybrid kernel machine ensemble for imbalanced data sets
    Li, Peng
    Chan, Kap Luk
    Fang, Wen
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 1108 - +
  • [23] Incremental label propagation for data sets with imbalanced labels
    Li, Yaoxing
    Bai, Liang
    Liang, Zhuomin
    Du, Hangyuan
    NEUROCOMPUTING, 2023, 535 : 144 - 155
  • [24] Training Deep Neural Networks on Imbalanced Data Sets
    Wang, Shoujin
    Liu, Wei
    Wu, Jia
    Cao, Longbing
    Meng, Qinxue
    Kennedy, Paul J.
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 4368 - 4374
  • [25] An Improved Algorithm for SVMs Classification of Imbalanced Data Sets
    Castro, Cristiano Leite
    Carvalho, Mateus Araujo
    Braga, Antonio Padua
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PROCEEDINGS, 2009, 43 : 108 - 118
  • [26] Classification of imbalanced marketing data with balanced random sets
    Nikulin, Vladimir
    McLachlan, Geoffrey J.
    Journal of Machine Learning Research, 2009, 7 : 89 - 100
  • [27] FUZZY AND SMOTE RESAMPLING TECHNIQUE FOR IMBALANCED DATA SETS
    Zorkeflee, Maisarah
    Din, Aniza Mohamed
    Ku-Mahamud, Ku Ruhana
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS, 2015, : 638 - 643
  • [28] Data Augmentation Meta-Classifier Scheme for imbalanced data sets
    Moreno-Barea, Francisco J.
    Jerez, Jose M.
    Franco, Leonardo
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1392 - 1399
  • [29] A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets
    Lopez, Victoria
    Fernandez, Alberto
    Jose del Jesus, Maria
    Herrera, Francisco
    KNOWLEDGE-BASED SYSTEMS, 2013, 38 : 85 - 104
  • [30] Diversity Exploration and Negative Correlation Learning on Imbalanced Data Sets
    Wang, Shuo
    Tang, Ke
    Yao, Xin
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 1796 - +