Large-Scale Learning with Structural Kernels for Class-Imbalanced Datasets

被引:0
|
作者
Severyn, Aliaksei [1 ]
Moschitti, Alessandro [1 ]
机构
[1] Univ Trento, Dept Comp Sci & Engn, I-38123 Povo, TN, Italy
来源
ETERNAL SYSTEMS | 2012年 / 255卷
关键词
Machine Learning; Kernel Methods; Structural Kernels; Support Vector Machine; Natural Language Processing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Much of the success in machine learning can be attributed to the ability of learning methods to adequately represent, extract, and exploit inherent structure present in the data under interest. Kernel methods represent a rich family of techniques that harvest on this principle. Domain-specific kernels are able to exploit rich structural information present in the input data to deliver state of the art results in many application areas, e.g. natural language processing (NLP), bio-informatics, computer vision and many others. The use of kernels to capture relationships in the input data has made Support Vector Machine (SVM) algorithm the state of the art tool in many application areas. Nevertheless, kernel learning remains a computationally expensive process. The contribution of this paper is to make learning with structural kernels, e.g. tree kernels, more applicable to real-world large-scale tasks. More specifically, we propose two important enhancements of the approximate cutting plane algorithm to train Support Vector Machines with structural kernels: (i) a new sampling strategy to handle class-imbalanced problem; and (ii) a parallel implementation, which makes the training scale almost linearly with the number of CPUs. We also show that theoretical convergence bounds are preserved for the improved algorithm. The experimental evaluations demonstrate the soundness of our approach and the possibility to carry out large-scale learning with structural kernels.
引用
收藏
页码:34 / 41
页数:8
相关论文
共 50 条
  • [31] Weight Decision Algorithm for Oversampling Technique on Class-Imbalanced Learning
    Kang, Young-Il
    Won, Sangchul
    INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2010), 2010, : 182 - 186
  • [32] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
    Chunkai Zhang
    Ying Zhou
    Jianwei Guo
    Guoquan Wang
    Xuan Wang
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 1765 - 1778
  • [33] Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints
    Bae, Su-Yong
    Lee, Jonga
    Jeong, Jaeseong
    Lim, Changwon
    Choi, Jinhee
    COMPUTATIONAL TOXICOLOGY, 2021, 20
  • [34] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
    Zhang, Chunkai
    Zhou, Ying
    Guo, Jianwei
    Wang, Guoquan
    Wang, Xuan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (07) : 1765 - 1778
  • [35] Sparse and online null proximal discriminant analysis for one class learning in large-scale datasets
    Dufrenois, Franck
    Hamad, Denis
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [36] Learning sample representativeness for class-imbalanced multi-label classification
    Zhang, Yu
    Cao, Sichen
    Mi, Siya
    Bian, Yali
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (02)
  • [37] On Supervised Class-Imbalanced Learning: An Updated Perspective and Some Key Challenges
    Das S.
    Mullick S.S.
    Zelinka I.
    IEEE Transactions on Artificial Intelligence, 2022, 3 (06): : 973 - 993
  • [38] Weed recognition using deep learning techniques on class-imbalanced imagery
    Hasan, A. S. M. Mahmudul
    Sohel, Ferdous
    Diepeveen, Dean
    Laga, Hamid
    Jones, Michael G. K.
    CROP & PASTURE SCIENCE, 2023, 74 (06): : 628 - 644
  • [39] Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections
    García-Pedrajas N.
    García-Osorio C.
    Progress in Artificial Intelligence, 2013, 2 (1) : 29 - 44
  • [40] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
    Maldonado, Sebastian
    Lopez, Julio
    APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105