Mitigating false negatives in imbalanced datasets: An ensemble approach

被引:0
|
作者
Vasconcelos, Marcelo [1 ]
Cavique, Luis [2 ,3 ]
机构
[1] Tribunal Contas Dist Fed, Brasilia, Brazil
[2] Univ Aberta, Lisbon, Portugal
[3] Lasige FCUL, Lisbon, Portugal
关键词
Imbalanced dataset; False negative rate; Ensemble algorithms; Fraud detection; Set covering problem; SMOTE;
D O I
10.1016/j.eswa.2024.125674
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced datasets present a challenge in machine learning, especially in binary classification scenarios where one class significantly outweighs the other. This imbalance often leads to models favoring the majority class, resulting in inadequate predictions for the minority class, specifically in false negatives. In response to this issue, this work introduces the MinFNR ensemble algorithm, designed to minimize False Negative Rates (FNR) in imbalanced datasets. The new approach strategically combines data-level, algorithmic-level, and hybrid-level approaches to enhance overall predictive capabilities while minimizing computational resources using the Set Covering Problem (SCP) formulation. Through a comprehensive evaluation of diverse datasets, MinFNR consistently outperforms individual algorithms, showing its potential for applications where the cost of false negatives is substantial, such as fraud detection and medical diagnosis. This work also contributes to ongoing efforts to improve the reliability and effectiveness of machine learning algorithms in real imbalanced scenarios.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Ensemble and Fuzzy Techniques Applied to Imbalanced Traffic Congestion Datasets: A Comparative Study
    Lopez-Garcia, Pedro
    Masegosa, Antonio D.
    Onieva, Enrique
    Osaba, Eneko
    BIOINSPIRED OPTIMIZATION METHODS AND THEIR APPLICATIONS, BIOMA 2018, 2018, 10835 : 185 - 196
  • [32] Variable Importance Analysis in Imbalanced Datasets: A New Approach
    Ahrazem Dfuf, Ismael
    Forte Perez-Minayo, Joaquin
    Mira Mcwilliams, Jose Manuel
    Gonzalez Fernandez, Camino
    IEEE ACCESS, 2020, 8 : 127404 - 127430
  • [33] ARCID: A New Approach to Deal with Imbalanced Datasets Classification
    Abdellatif, Safa
    Ben Hassine, Mohamed Ali
    Ben Yahia, Sadok
    Bouzeghoub, Amel
    SOFSEM 2018: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2018, 10706 : 569 - 580
  • [34] A GENETIC RULE LEARNING APPROACH TO DEAL WITH IMBALANCED DATASETS
    Mahani, Aouatef
    Benkhider, Sadjia
    Baba-Ali, Ahmed Riadh
    PROCEEDINGS OF THE EUROPEAN CONFERENCE ON DATA MINING 2015 AND INTERNATIONAL CONFERENCES ON INTELLIGENT SYSTEMS AND AGENTS 2015 AND THEORY AND PRACTICE IN MODERN COMPUTING 2015, 2015, : 151 - 156
  • [35] A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
    ThulasiBikku
    Rao, Sambasiva
    Akepogu, Ananda Rao
    INTERNATIONAL CONFERENCE ON MATERIALS, ALLOYS AND EXPERIMENTAL MECHANICS (ICMAEM-2017), 2017, 225
  • [36] Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review
    Lenka, Sudhansu R.
    Bisoy, Sukant Kishoro
    Priyadarshini, Rojalina
    Sain, Mangal
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [37] Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning
    Habbat, Nassera
    Nouri, Hicham
    Anoun, Houda
    Hassouni, Larbi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [38] Ensemble learning predicts glass-forming ability under imbalanced datasets
    Cheng, Duan-jie
    Liang, Yong-chao
    Pu, Yuan-wei
    Chen, Qian
    COMPUTATIONAL MATERIALS SCIENCE, 2025, 248
  • [39] False positives and false negatives in capital cases
    Lillquist, E
    INDIANA LAW JOURNAL, 2005, 80 (01) : 49 - 52
  • [40] False positives and false negatives in genome scans
    Rao, DC
    Gu, C
    GENETIC DISSECTION OF COMPLEX TRAITS, 2001, 42 : 487 - 498