A fast classification strategy for SVM on the large-scale high-dimensional datasets

被引：0

作者：

I-Jing Li

Jiunn-Lin Wu

Chih-Hung Yeh

机构：

[1] National Taichung University of Science and Technology,Department of Applied Statistics

[2] National Chung Hsing University,Deptartment of Computer Science and Engineering

来源：

Pattern Analysis and Applications | 2018年 / 21卷

关键词：

Profile support vector machine; Large-scale datasets; High-dimensional data; MagKmeans algorithm; Fast condensed nearest neighbor rule;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.

引用

页码：1023 / 1038

页数：15

共 50 条

[41] Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data
Liang, Yu
Chaudhuri, Arin
Wang, Haoyu
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 361 - 372
[42] Large-scale parallel simulation of high-dimensional american option pricing
Chang, Hong-Xu
Lu, Zhong-Hua
Chi, Xue-Bin
Journal of Algorithms and Computational Technology, 2012, 6 (01): : 1 - 16
[43] A fast and scalable similarity search in high-dimensional image datasets
Hanyf, Youssef
Silkan, Hassan
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2019, 59 (01) : 95 - 104
[44] A fast and scalable similarity search in high-dimensional image datasets
Hanyf Y.
Silkan H.
International Journal of Computer Applications in Technology, 2019, 59 (01): : 95 - 104
[45] Large-Scale Image Classification Using Fast SVM with Deep Quasi-Linear Kernel
Liang, Peifeng
Li, Weite
Liu, Donghang
Hu, Jinglu
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1064 - 1071
[46] Mean Decision Rules Method with Smart Sampling for Fast Large-Scale Binary SVM Classification
Makarova, Alexandra
Kurbakov, Mikhail
Sulimova, Valentina
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8212 - 8219
[47] Fast SVM training using data reconstruction for classification of very large datasets
Liang, Peileng
Li, Weite
Hu, Jinglu
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2020, 15 (03) : 372 - 381
[48] An SMO Approach to Fast SVM for Classification of Large Scale Data
Lin, Juanxi
Song, Mengnan
Hu, Jinglu
2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
[49] Fast truncated Huber loss SVM for large scale classification
Wang, Huajun
Shao, Yuanhai
KNOWLEDGE-BASED SYSTEMS, 2023, 260
[50] Large-scale high-dimensional indexing by sparse hashing with l 0 approximation
Borges, Pedro
Mourao, Andre
Magalhaes, Joao
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (22) : 24389 - 24412

← 1 2 3 4 5 →