Mitigating false negatives in imbalanced datasets: An ensemble approach

被引:0
|
作者
Vasconcelos, Marcelo [1 ]
Cavique, Luis [2 ,3 ]
机构
[1] Tribunal Contas Dist Fed, Brasilia, Brazil
[2] Univ Aberta, Lisbon, Portugal
[3] Lasige FCUL, Lisbon, Portugal
关键词
Imbalanced dataset; False negative rate; Ensemble algorithms; Fraud detection; Set covering problem; SMOTE;
D O I
10.1016/j.eswa.2024.125674
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced datasets present a challenge in machine learning, especially in binary classification scenarios where one class significantly outweighs the other. This imbalance often leads to models favoring the majority class, resulting in inadequate predictions for the minority class, specifically in false negatives. In response to this issue, this work introduces the MinFNR ensemble algorithm, designed to minimize False Negative Rates (FNR) in imbalanced datasets. The new approach strategically combines data-level, algorithmic-level, and hybrid-level approaches to enhance overall predictive capabilities while minimizing computational resources using the Set Covering Problem (SCP) formulation. Through a comprehensive evaluation of diverse datasets, MinFNR consistently outperforms individual algorithms, showing its potential for applications where the cost of false negatives is substantial, such as fraud detection and medical diagnosis. This work also contributes to ongoing efforts to improve the reliability and effectiveness of machine learning algorithms in real imbalanced scenarios.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets
    Zhang, Yong
    Wang, Dapeng
    ABSTRACT AND APPLIED ANALYSIS, 2013,
  • [22] An interpretable decision tree ensemble model for imbalanced credit scoring datasets
    My, Bui T. T.
    Ta, Bao Q.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 10853 - 10864
  • [23] Image concept detection in imbalanced datasets with ensemble of convolutional neural networks
    Bahrami, Maryam
    Sajedi, Hedieh
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 1131 - 1144
  • [24] FALSE NEGATIVES AND THE DISADVANTAGED
    KRANTZ, GC
    PERSONNEL AND GUIDANCE JOURNAL, 1965, 43 (08): : 821 - 821
  • [25] THE PROBLEM OF FALSE NEGATIVES
    KELMAN, M
    SOCIETY, 1990, 27 (03) : 21 - 23
  • [26] THE PROBLEM OF FALSE NEGATIVES
    ROSS, DW
    AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 1984, 82 (04) : 508 - 508
  • [27] Feature Selection and Ensemble Hierarchical Cluster-based Under-sampling Approach for Extremely Imbalanced Datasets
    Soltani, Sima
    Sadri, Javad
    Torshizi, Hassan Ahmadi
    2011 1ST INTERNATIONAL ECONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2011, : 166 - 171
  • [28] Applying Resampling Methods for Imbalanced Datasets to Not So Imbalanced Datasets
    Arbelaitz, Olatz
    Gurrutxaga, Ibai
    Muguerza, Javier
    Maria Perez, Jesus
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2013, 2013, 8109 : 111 - 120
  • [29] An Improved Ensemble Approach for Imbalanced Classification Problems
    Krawczyk, Bartosz
    Schaefer, Gerald
    2013 IEEE 8TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS (SACI 2013), 2013, : 423 - 426
  • [30] A New Hybrid Sampling Approach for Classification of Imbalanced Datasets
    Hanskunatai, Anantaporn
    PROCEEDINGS OF 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS), 2018, : 67 - 71