Mitigating false negatives in imbalanced datasets: An ensemble approach

被引:0
|
作者
Vasconcelos, Marcelo [1 ]
Cavique, Luis [2 ,3 ]
机构
[1] Tribunal Contas Dist Fed, Brasilia, Brazil
[2] Univ Aberta, Lisbon, Portugal
[3] Lasige FCUL, Lisbon, Portugal
关键词
Imbalanced dataset; False negative rate; Ensemble algorithms; Fraud detection; Set covering problem; SMOTE;
D O I
10.1016/j.eswa.2024.125674
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced datasets present a challenge in machine learning, especially in binary classification scenarios where one class significantly outweighs the other. This imbalance often leads to models favoring the majority class, resulting in inadequate predictions for the minority class, specifically in false negatives. In response to this issue, this work introduces the MinFNR ensemble algorithm, designed to minimize False Negative Rates (FNR) in imbalanced datasets. The new approach strategically combines data-level, algorithmic-level, and hybrid-level approaches to enhance overall predictive capabilities while minimizing computational resources using the Set Covering Problem (SCP) formulation. Through a comprehensive evaluation of diverse datasets, MinFNR consistently outperforms individual algorithms, showing its potential for applications where the cost of false negatives is substantial, such as fraud detection and medical diagnosis. This work also contributes to ongoing efforts to improve the reliability and effectiveness of machine learning algorithms in real imbalanced scenarios.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Estimation of false negatives in classification
    Mane, S
    Srivastava, J
    Hwang, SY
    Vayghan, J
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 475 - 478
  • [42] The true and false negatives of screening
    Bullimore, MA
    OPTOMETRY AND VISION SCIENCE, 1998, 75 (07) : 461 - 461
  • [43] Bug hunting with false negatives
    Calame, Jens
    Ioustinova, Natalia
    van de Pol, Jaco
    Sidorova, Natalia
    INTEGRATED FORMAL METHODS, PROCEEDINGS, 2007, 4591 : 98 - 117
  • [44] False Negatives of the Categorical Imperative
    McCarty, Richard
    MIND, 2015, 124 (493) : 177 - 200
  • [45] Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets
    Lenka, Sudhansu R.
    Bisoy, Sukant Kishoro
    Priyadarshini, Rojalina
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (09) : 5429 - 5457
  • [46] AWGAN: An adaptive weighting GAN approach for oversampling imbalanced datasets
    Guan, Shaopeng
    Zhao, Xiaoyan
    Xue, Yuewei
    Pan, Hao
    INFORMATION SCIENCES, 2024, 663
  • [47] Primary PCI: false positives versus false negatives
    van der Giessen, W. J.
    NETHERLANDS HEART JOURNAL, 2008, 16 (10) : 323 - 324
  • [48] Primary PCI: false positives versus false negatives
    W. J. van der Giessen
    Netherlands Heart Journal, 2008, 16 : 323 - 324
  • [49] FALSE POSITIVES FALSE NEGATIVES IN DEVELOPMENTAL TOXICOLOGY AND TERATOLOGY
    JOHNSON, EM
    TERATOLOGY, 1986, 34 (03) : 361 - 362
  • [50] Suspected child abuse: false positives or false negatives?
    Heller, RF
    Jamrozik, K
    Weller, DP
    MEDICAL JOURNAL OF AUSTRALIA, 2004, 181 (05) : 240 - 241