A fast classification strategy for SVM on the large-scale high-dimensional datasets

被引:0
|
作者
I-Jing Li
Jiunn-Lin Wu
Chih-Hung Yeh
机构
[1] National Taichung University of Science and Technology,Department of Applied Statistics
[2] National Chung Hsing University,Deptartment of Computer Science and Engineering
来源
关键词
Profile support vector machine; Large-scale datasets; High-dimensional data; MagKmeans algorithm; Fast condensed nearest neighbor rule;
D O I
暂无
中图分类号
学科分类号
摘要
The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.
引用
收藏
页码:1023 / 1038
页数:15
相关论文
共 50 条
  • [41] Visualizing the Finer Cluster Structure of Large-Scale and High-Dimensional Data
    Liang, Yu
    Chaudhuri, Arin
    Wang, Haoyu
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 361 - 372
  • [42] Large-scale parallel simulation of high-dimensional american option pricing
    Chang, Hong-Xu
    Lu, Zhong-Hua
    Chi, Xue-Bin
    Journal of Algorithms and Computational Technology, 2012, 6 (01): : 1 - 16
  • [43] A fast and scalable similarity search in high-dimensional image datasets
    Hanyf, Youssef
    Silkan, Hassan
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2019, 59 (01) : 95 - 104
  • [44] A fast and scalable similarity search in high-dimensional image datasets
    Hanyf Y.
    Silkan H.
    International Journal of Computer Applications in Technology, 2019, 59 (01): : 95 - 104
  • [45] Large-Scale Image Classification Using Fast SVM with Deep Quasi-Linear Kernel
    Liang, Peifeng
    Li, Weite
    Liu, Donghang
    Hu, Jinglu
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1064 - 1071
  • [46] Mean Decision Rules Method with Smart Sampling for Fast Large-Scale Binary SVM Classification
    Makarova, Alexandra
    Kurbakov, Mikhail
    Sulimova, Valentina
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8212 - 8219
  • [47] Fast SVM training using data reconstruction for classification of very large datasets
    Liang, Peileng
    Li, Weite
    Hu, Jinglu
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2020, 15 (03) : 372 - 381
  • [48] An SMO Approach to Fast SVM for Classification of Large Scale Data
    Lin, Juanxi
    Song, Mengnan
    Hu, Jinglu
    2014 INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2014,
  • [49] Fast truncated Huber loss SVM for large scale classification
    Wang, Huajun
    Shao, Yuanhai
    KNOWLEDGE-BASED SYSTEMS, 2023, 260
  • [50] Large-scale high-dimensional indexing by sparse hashing with l 0 approximation
    Borges, Pedro
    Mourao, Andre
    Magalhaes, Joao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (22) : 24389 - 24412