Mitigating false negatives in imbalanced datasets: An ensemble approach

被引:0
|
作者
Vasconcelos, Marcelo [1 ]
Cavique, Luis [2 ,3 ]
机构
[1] Tribunal Contas Dist Fed, Brasilia, Brazil
[2] Univ Aberta, Lisbon, Portugal
[3] Lasige FCUL, Lisbon, Portugal
关键词
Imbalanced dataset; False negative rate; Ensemble algorithms; Fraud detection; Set covering problem; SMOTE;
D O I
10.1016/j.eswa.2024.125674
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced datasets present a challenge in machine learning, especially in binary classification scenarios where one class significantly outweighs the other. This imbalance often leads to models favoring the majority class, resulting in inadequate predictions for the minority class, specifically in false negatives. In response to this issue, this work introduces the MinFNR ensemble algorithm, designed to minimize False Negative Rates (FNR) in imbalanced datasets. The new approach strategically combines data-level, algorithmic-level, and hybrid-level approaches to enhance overall predictive capabilities while minimizing computational resources using the Set Covering Problem (SCP) formulation. Through a comprehensive evaluation of diverse datasets, MinFNR consistently outperforms individual algorithms, showing its potential for applications where the cost of false negatives is substantial, such as fraud detection and medical diagnosis. This work also contributes to ongoing efforts to improve the reliability and effectiveness of machine learning algorithms in real imbalanced scenarios.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Ensemble of Rotation Trees for Imbalanced Medical Datasets
    Guo, Huaping
    Liu, Haiyan
    Wu, Chang-an
    Liu, Wei
    She, Wei
    JOURNAL OF HEALTHCARE ENGINEERING, 2018, 2018
  • [2] New Construction of Ensemble Classifiers for Imbalanced Datasets
    Zhai, Yun
    Ruan, Da
    Ma, Nan
    An, Bing
    JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING, 2012, 18 (5-6) : 599 - 616
  • [3] Dual Approach to Handling Imbalanced Class in Datasets Using Oversampling and Ensemble Learning Techniques
    Pristyanto, Yoga
    Nugraha, Anggit Ferdita
    Pratama, Irfan
    Dahlan, Akhmad
    Wirasakti, Lucky Adhikrisna
    PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [4] Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization
    Wang, Shiqi
    Zhang, Yeqin
    Cam-Tu Nguyen
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19171 - 19179
  • [5] LoRAS: an oversampling approach for imbalanced datasets
    Saptarshi Bej
    Narek Davtyan
    Markus Wolfien
    Mariam Nassar
    Olaf Wolkenhauer
    Machine Learning, 2021, 110 : 279 - 301
  • [6] A Practical Anonymization Approach for Imbalanced Datasets
    Majeed, Abdul
    Hwang, Seong Oun
    IT PROFESSIONAL, 2022, 24 (01) : 63 - 69
  • [7] A Hybrid Approach Handling Imbalanced Datasets
    Soda, Paolo
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2009, PROCEEDINGS, 2009, 5716 : 209 - 218
  • [8] LoRAS: an oversampling approach for imbalanced datasets
    Bej, Saptarshi
    Davtyan, Narek
    Wolfien, Markus
    Nassar, Mariam
    Wolkenhauer, Olaf
    MACHINE LEARNING, 2021, 110 (02) : 279 - 301
  • [9] False negatives
    Juniper, I
    CHEMISTRY IN BRITAIN, 1996, 32 (04) : 26 - 26
  • [10] Iterative minority oversampling and its ensemble for ordinal imbalanced datasets
    Wang, Ning
    Zhang, Zhong-Liang
    Luo, Xing-Gang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127