Analysis of imbalanced data using cost-sensitive learning

被引:0
|
作者
Kim, Sojin [1 ]
Song, Jongwoo [1 ]
机构
[1] Ewha Womans Univ, Dept Stat, Seoul, South Korea
关键词
Imbalanced classification; cost-sensitive learning; classification performance; hybrid classification; SMOTE;
D O I
10.1080/03610926.2025.2472792
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Typically, classification algorithms strive to maximize the accuracy. However, when dealing with significantly imbalanced data, accuracy may not be the most suitable metric. We believe that the most effective approach for handling imbalanced cases is to minimize the total costs. Unfortunately, precise costs for misclassification are often unavailable in real-world scenarios. To address this problem, we offer a simple and efficient search algorithm for cost-sensitive learning. We also introduce a new performance metric, imbalanced data classification performance (IDCP), which combines the F-score and the area under the curve (AUC). By utilizing the imbalance ratio (IR) as a crucial factor, we use IDCP to determine optimal weights in cost-sensitive learning. Through extensive experiments, we show that our method can find the optimal decision boundary in imbalanced datasets. Our code is available at https://github.com/sssojin/Imbalanced_Classification
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics
    Lopez, Victoria
    Fernandez, Alberto
    Moreno-Torres, Jose G.
    Herrera, Francisco
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (07) : 6585 - 6608
  • [42] Cost-Sensitive Active Learning for Incomplete Data
    Wang, Min
    Yang, Chunyu
    Zhao, Fei
    Min, Fan
    Wang, Xizhao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (01): : 405 - 416
  • [43] Cost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams
    Krawczyk, Bartosz
    Skryjomski, Przemyslaw
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II, 2017, 10535 : 512 - 527
  • [44] Cost-Sensitive Learning
    Zhou, Zlii-Hua
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, MDAI 2011, 2011, 6820 : 17 - 18
  • [45] Machine learning based novel cost-sensitive seizure detection classifier for imbalanced EEG data sets
    Mohammad Khubeb Siddiqui
    Xiaodi Huang
    Ruben Morales-Menendez
    Nasir Hussain
    Khudeja Khatoon
    International Journal on Interactive Design and Manufacturing (IJIDeM), 2020, 14 : 1491 - 1509
  • [46] CSIML: a cost-sensitive and iterative machine-learning method for small and imbalanced materials data sets
    Li, Shengzhou
    Nakata, Ayako
    CHEMISTRY LETTERS, 2024, 53 (05)
  • [47] Machine learning based novel cost-sensitive seizure detection classifier for imbalanced EEG data sets
    Siddiqui, Mohammad Khubeb
    Huang, Xiaodi
    Morales-Menendez, Ruben
    Hussain, Nasir
    Khatoon, Khudeja
    INTERNATIONAL JOURNAL OF INTERACTIVE DESIGN AND MANUFACTURING - IJIDEM, 2020, 14 (04): : 1491 - 1509
  • [48] Multiscale cost-sensitive learning-based assembly quality prediction approach under imbalanced data
    Wang, Tianyue
    Hu, Bingtao
    Feng, Yixiong
    Gong, Hao
    Zhong, Ruirui
    Yang, Chen
    Tan, Jianrong
    ADVANCED ENGINEERING INFORMATICS, 2024, 62
  • [49] Improved cost-sensitive representation of data for solving the imbalanced big data classification problem
    Fattahi, Mahboubeh
    Moattar, Mohammad Hossein
    Forghani, Yahya
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [50] A cost-sensitive active learning algorithm: toward imbalanced time series forecasting
    Zhang, Jing
    Dai, Qun
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (09): : 6953 - 6972