Multiple Kernel Learning With Minority Oversampling for Classifying Imbalanced Data

被引:6
|
作者
Wang, Ling [1 ]
Wang, Hongqiao [1 ]
Fu, Guangyuan [1 ]
机构
[1] Rocket Force Univ Engn, Dept Informat Engn, Xian 710025, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Sensitivity; Shape; Classification algorithms; Kernel; Task analysis; Standards; Class imbalanced learning; multiple kernel learning; nonlinear oversampling; cost-sensitive;
D O I
10.1109/ACCESS.2020.3046604
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance problems, developed due to the sampling bias or measurement error, occur frequently in real-world pattern classification tasks. The traditional classifiers focus on the overall classification accuracy and ignore the minority class, which may degrade the classification performance. However, existing oversampling algorithms generally make specific assumptions to balance the class size and do not sufficiently consider irregularities present in imbalanced data. As a result, these methods can perform well only on certain benchmarks. In this paper, by incorporating minority oversampling and cost-sensitive learning, we propose multiple kernel learning with minority oversampling (MKLMO), for efficiently handling the class imbalance problem with small disjuncts, overlapping, and nonlinear shape. Unlike existing methods where oversampling of the minority class is performed first and then a standard classifier is deployed on the rebalanced data, the proposed MKLMO generates synthetic instances and trains classifier synchronously in the same feature space. Specially, we define a distance metric in the optimal feature space by multiple kernel learning and use kernel trick to expand the original Gram matrix. Moreover, we assign different weights to instances, based on the imbalance ratio, for reducing the bias of the classifier towards the majority class. In order to evaluate the proposed MKLMO method, several experiments are performed with nine artificial and twenty-one real-world datasets. The experimental results show that our algorithm outperforms other baseline algorithms significantly in terms of the assessment metric geometric mean (G-mean), especially in the presence of data irregularities.
引用
收藏
页码:565 / 580
页数:16
相关论文
共 50 条
  • [21] Minority oversampling for imbalanced time series classification
    Zhu, Tuanfei
    Luo, Cheng
    Zhang, Zhihong
    Li, Jing
    Ren, Siqi
    Zeng, Yifu
    KNOWLEDGE-BASED SYSTEMS, 2022, 247
  • [22] Classifying Multiple Imbalanced Attributes in Relational Data
    Ghanem, Amal S.
    Venkatesh, Svetha
    West, Geoff
    AI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5866 : 220 - 229
  • [23] A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning
    Elreedy, Dina
    Atiya, Amir F.
    Kamalov, Firuz
    MACHINE LEARNING, 2024, 113 (07) : 4903 - 4923
  • [24] MI-MOTE: Multiple imputation-based minority oversampling technique for imbalanced and incomplete data classification
    Shin, Kyoham
    Han, Jongmin
    Kang, Seokho
    INFORMATION SCIENCES, 2021, 575 : 80 - 89
  • [25] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
    Tripathi, Ayush
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
  • [26] Local distribution-based adaptive minority oversampling for imbalanced data classification
    Wang, Xinyue
    Xu, Jian
    Zeng, Tieyong
    Jing, Liping
    NEUROCOMPUTING, 2021, 422 : 200 - 213
  • [27] Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data
    Liu, Jie
    SOFT COMPUTING, 2022, 26 (03) : 1141 - 1163
  • [28] Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data
    Jie Liu
    Soft Computing, 2022, 26 : 1141 - 1163
  • [29] Hybrid Oversampling Technique Based on Star Topology and Rejection Methodology for Classifying Imbalanced Data
    Lee, Chaekyu
    Kim, Jaekwang
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 1217 - 1226
  • [30] On the Role of Cost-Sensitive Learning in Imbalanced Data Oversampling
    Krawczyk, Bartosz
    Wozniak, Michal
    COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 180 - 191