Support Vector Machines on Large Data Sets: Simple Parallel Approaches

被引：13

作者：

Meyer, Oliver ^{[1
]}

Bischl, Bernd ^{[1
]}

Weihs, Claus ^{[1
]}

机构：

[1] TU Dortmund, Dept Stat, Chair Computat Stat, Dortmund, Germany

来源：

DATA ANALYSIS, MACHINE LEARNING AND KNOWLEDGE DISCOVERY | 2014年

关键词：

D O I：

10.1007/978-3-319-01595-8_10

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521-528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.

引用

页码：87 / 95

页数：9

共 50 条

[31] A comparison of validation methods for learning vector quantization and for support vector machines on two biomedical data sets
Sommer, D
Golz, M
From Data and Information Analysis to Knowledge Engineering, 2006, : 150 - 157
[32] Twin support vector machines based on rough sets
Yu, J. (junzhao1989@163.com), 1600, Advanced Institute of Convergence Information Technology, Myoungbo Bldg 3F,, Bumin-dong 1-ga, Seo-gu, Busan, 602-816, Korea, Republic of (06):
[33] Selecting training sets for support vector machines: a review
Nalepa, Jakub
Kawulok, Michal
ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (02) : 857 - 900
[34] Learning Confidence Sets using Support Vector Machines
Wang, Wenbo
Qiao, Xingye
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[35] Selecting training sets for support vector machines: a review
Jakub Nalepa
Michal Kawulok
Artificial Intelligence Review, 2019, 52 : 857 - 900
[36] A Hybrid Algorithm to Improve the Accuracy of Support Vector Machines on Skewed Data-Sets
Cervantes, Jair
Huang, De-Shuang
Garcia-Lamont, Farid
Lopez Chau, Asdrubal
INTELLIGENT COMPUTING THEORY, 2014, 8588 : 782 - 788
[37] Data Augmentation for Support Vector Machines
Mallick, Bani K.
Chakraborty, Sounak
Ghosh, Malay
BAYESIAN ANALYSIS, 2011, 6 (01): : 25 - 29
[38] Support vector machines for dyadic data
Hochreiter, Sepp
Obermayer, Klaus
NEURAL COMPUTATION, 2006, 18 (06) : 1472 - 1510
[39] Support Vector Classification for Large Data Sets by Reducing Training Data with Change of Classes
Cervantes, Jair
Li, Xiaoou
Yu, Wen
2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 2608 - +
[40] Comments on the "Core Vector Machines: Fast SVM training on very large data sets"
Loosli, Gaelle
Canu, Stephane
JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 291 - 301

← 1 2 3 4 5 →