A novel robust kernel for classifying high-dimensional data using Support Vector Machines

被引:41
|
作者
Hussain, Syed Fawad [1 ]
机构
[1] Ghulam Ishaq Khan Inst Engn Sci & Technol, Machine Learning & Data Sci MDS Lab, Fac Comp Sci & Engn, Topi, Pakistan
关键词
Semantic kernels; Support Vector Machines; Co-clustering; Label noise; TEXT CLASSIFICATION; CLASSIFIERS; ALGORITHM;
D O I
10.1016/j.eswa.2019.04.037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new semantic kernel for classification of high-dimensional data in the framework of Support Vector Machines (SVM). SVMs have gained widespread application due to their relatively higher accuracy. The efficacy of SVMs, however, depends upon the separation of the data itself as well as the kernel function. Text data, for instance, is difficult to classify due to synonymy and polysemy in its contents, having multi-topical instances that can result in mislabeling, and being highly sparse in the bag-of-words representation. While the soft margin parameter and kernel tricks are used in SVM to deal with outliers and non-linearly separable data, using data statistics and correlation has not been fully explored in the literature. This paper explore the use co-similarity (i.e., soft co-clustering) to find latent relationships between documents motivated by the success of co-clustering and subspace clustering methods. It has been shown that the use of weighted higher-order paths between instances in the data can be a good measure of similarity values which can then be used for both classification and to correct mislabeled (or outlier) data in the training set. The proposed kernel is generic in nature and suitable for sparse, dyadic data where direct co-occurrences are not necessary common as in the case of textual data, link-analysis in social media networks, co-authorship, etc. It also studies the impact of noise in the training data and provides a technique to re-label such instances. It is also observed that re-labelling of selected training data reduces the adverse effect of outliers or label noise and can greatly improve the classification of the test data. To the best of our knowledge, we are the first to introduce a supervised co-similarity based kernel function and also provide mathematical formulation to show that it is a valid Mercer's kernel. Our experiments show that the proposed framework outperforms current and state-of-the-art methods in terms of classification accuracy and is more resilient to label noise. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:116 / 131
页数:16
相关论文
共 50 条
  • [31] Classification of Remote Sensed Data Using Linear Kernel Based Support Vector Machines
    Rao, Tarun
    Rajasekhar, N.
    Rajinikanth, T. V.
    Sundar, K. S.
    2013 INTERNATIONAL CONFERENCE ON CONTROL COMMUNICATION AND COMPUTING (ICCC), 2013, : 22 - +
  • [32] Robust Local Triangular Kernel Density-based Clustering for High-dimensional Data
    Musdholifah, Aina
    Hashim, Siti Zaiton Mohd
    2013 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2013, : 24 - 32
  • [33] Classifying segmented hyperspectral data from a heterogeneous urban environment using support vector machines
    van der Linden, Sebastian
    Janz, Andreas
    Waske, Bjoern
    Eiden, Michael
    Hostert, Patrick
    JOURNAL OF APPLIED REMOTE SENSING, 2007, 1
  • [34] Classifying High-Dimensional Text and Web Data using Very Short Patterns
    Malik, Hassan H.
    Kender, John R.
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 923 - 928
  • [35] Kernel trees for support vector machines
    Methasate, Ithipan
    Theeramunkong, Thanaruk
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (10) : 1550 - 1556
  • [36] Support vector machine and optimal parameter selection for high-dimensional imbalanced data
    Nakayama, Yugo
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (11) : 6739 - 6754
  • [37] Parallel Clifford Support Vector Machines Using the Gaussian Kernel
    Lopez-Gonzalez, Gehova
    Arana-Daniel, Nancy
    Bayro-Corrochano, Eduardo
    ADVANCES IN APPLIED CLIFFORD ALGEBRAS, 2017, 27 (01) : 647 - 660
  • [38] Parallel Clifford Support Vector Machines Using the Gaussian Kernel
    Gehová López-González
    Nancy Arana-Daniel
    Eduardo Bayro-Corrochano
    Advances in Applied Clifford Algebras, 2017, 27 : 647 - 660
  • [39] Using Polynomial Kernel Support Vector Machines for Speaker Verification
    Yaman, Sibel
    Pelecanos, Jason
    IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (09) : 901 - 904
  • [40] BALANCED VS IMBALANCED TRAINING DATA: CLASSIFYING RAPIDEYE DATA WITH SUPPORT VECTOR MACHINES
    Ustuner, M.
    Sanli, F. B.
    Abdikan, S.
    XXIII ISPRS CONGRESS, COMMISSION VII, 2016, 41 (B7): : 379 - 384