Support Vector Machines on Large Data Sets: Simple Parallel Approaches

被引:13
|
作者
Meyer, Oliver [1 ]
Bischl, Bernd [1 ]
Weihs, Claus [1 ]
机构
[1] TU Dortmund, Dept Stat, Chair Computat Stat, Dortmund, Germany
关键词
D O I
10.1007/978-3-319-01595-8_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521-528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.
引用
收藏
页码:87 / 95
页数:9
相关论文
共 50 条
  • [31] A comparison of validation methods for learning vector quantization and for support vector machines on two biomedical data sets
    Sommer, D
    Golz, M
    From Data and Information Analysis to Knowledge Engineering, 2006, : 150 - 157
  • [32] Twin support vector machines based on rough sets
    Yu, J. (junzhao1989@163.com), 1600, Advanced Institute of Convergence Information Technology, Myoungbo Bldg 3F,, Bumin-dong 1-ga, Seo-gu, Busan, 602-816, Korea, Republic of (06):
  • [33] Selecting training sets for support vector machines: a review
    Nalepa, Jakub
    Kawulok, Michal
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (02) : 857 - 900
  • [34] Learning Confidence Sets using Support Vector Machines
    Wang, Wenbo
    Qiao, Xingye
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [35] Selecting training sets for support vector machines: a review
    Jakub Nalepa
    Michal Kawulok
    Artificial Intelligence Review, 2019, 52 : 857 - 900
  • [36] A Hybrid Algorithm to Improve the Accuracy of Support Vector Machines on Skewed Data-Sets
    Cervantes, Jair
    Huang, De-Shuang
    Garcia-Lamont, Farid
    Lopez Chau, Asdrubal
    INTELLIGENT COMPUTING THEORY, 2014, 8588 : 782 - 788
  • [37] Data Augmentation for Support Vector Machines
    Mallick, Bani K.
    Chakraborty, Sounak
    Ghosh, Malay
    BAYESIAN ANALYSIS, 2011, 6 (01): : 25 - 29
  • [38] Support vector machines for dyadic data
    Hochreiter, Sepp
    Obermayer, Klaus
    NEURAL COMPUTATION, 2006, 18 (06) : 1472 - 1510
  • [39] Support Vector Classification for Large Data Sets by Reducing Training Data with Change of Classes
    Cervantes, Jair
    Li, Xiaoou
    Yu, Wen
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 2608 - +
  • [40] Comments on the "Core Vector Machines: Fast SVM training on very large data sets"
    Loosli, Gaelle
    Canu, Stephane
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 291 - 301