A hierarchical VQSVM for imbalanced data sets

被引:4
|
作者
Yu, Ting [1 ]
Jan, Tony [1 ]
Simoff, Simeon [1 ]
Debenham, John [1 ]
机构
[1] Univ Technol Sydney, Fac Informat Technol, Sydney, NSW 2007, Australia
关键词
D O I
10.1109/IJCNN.2007.4371010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
First, a hierarchical modelling method, VQSVM, is introduced, and some remarks are discussed. Secondly the proposed VQSVM is applied to a nonstandard learning environment, imbalanced data sets. In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. The hierarchical VQSVM contains a set of local models i.e. codevectors produced by the Vector Quantization and a global model, i.e. Support Vector Machine, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling rate. Experiments compare VQSVM with random resampling techniques on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQSVM is superior or equivalent to random resampling techniques, especially in case of extremely imbalanced large datasets.
引用
收藏
页码:518 / 523
页数:6
相关论文
共 50 条
  • [1] Classifying imbalanced data sets using similarity based hierarchical decomposition
    Beyan, Cigdem
    Fisher, Robert
    PATTERN RECOGNITION, 2015, 48 (05) : 1653 - 1672
  • [2] Data Mining on Imbalanced Data Sets
    Gu, Qiong
    Cai, Zhihua
    Zhu, Li
    Huang, Bo
    2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, 2008, : 1020 - 1024
  • [3] A LEARNING METHOD FOR IMBALANCED DATA SETS
    de la Calleja, Jorge
    Fuentes, Olac
    Gonzalez, Jesus
    Aceves-Perez, Rita M.
    KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : 307 - +
  • [4] Graph Classification with Imbalanced Data Sets
    Xiao, Gang-Song
    Chen, Xiao-Yun
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 57 - 61
  • [5] The Text Classification for Imbalanced Data Sets
    Li, Yanling
    Zhu, Yehang
    Yang, Ping
    ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 778 - +
  • [6] Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets
    Fernandez, Alberto
    del Jesus, Maria Jose
    Herrera, Francisco
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2009, 50 (03) : 561 - 577
  • [7] An evaluation of progressive sampling for imbalanced data sets
    Ng, Willie
    Dash, Manoranjan
    ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 657 - +
  • [8] Balanced Neighborhood Classifiers for Imbalanced Data Sets
    Zhu, Shunzhi
    Ma, Ying
    Pan, Weiwei
    Zhu, Xiatian
    Luo, Guangchun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (12): : 3226 - 3229
  • [9] Classification with local clustering in imbalanced data sets
    Ji, Hua
    Zhang, Huaxiang
    ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 151 - 155
  • [10] A Supervised Learning Approach for Imbalanced Data Sets
    Nguyen, Giang H.
    Bouzerdoum, Abdesselam
    Phung, Son L.
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3759 - 3762