Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection

被引:1
|
作者
Zhao, Xiaosong [1 ]
Liu, Yong [1 ]
Zhao, Qiangfu [1 ]
机构
[1] Univ Aizu, Grad Sch, Aizu Wakamatsu, Fukushima 9658580, Japan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Class balancing cost-harmonization LightGBM; cost-sensitive; credit card fraud detection; extremely imbalanced data; interpretability; oversampling; SMOTE; CHALLENGES;
D O I
10.1109/ACCESS.2024.3487212
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Credit card fraud (CCF) is a significant threat to cardholders and financial institutions. CCF detection against this threat is challenging due to extremely imbalanced data (EID). EID involves extremely few instances of fraud for training and an extremely high risk of overlooking fraud. While class balancing or oversampling techniques can address the former problem by punishing negative classes or augmenting the positive data, they do not mitigate the latter. In contrast, the cost-sensitive learning approach targets only the high risk of false negative errors. Therefore, existing approaches are insufficient to solve all the issues of the EID problem. Based on the LightGBM (Light Gradient Boosting Machine) framework, this study introduces two novel machine-learning methods: the class balancing cost-harmonization LightGBM (CB-CHL-LightGBM) and the oversampling cost-harmonization LightGBM (OS-CHL-LightGBM). The new approaches combine class balancing or oversampling technology with LightGBM to solve the EID problem comprehensively. They enhance the efficacy of LightGBM in CCF detection scenarios. Experimental results on three CCF datasets indicate that the two proposed methods outperform LightGBM in several crucial performance metrics. For example, compared with the original LightGBM, CB-CHL-LightGBM or OS-CHL-LightGBM can increase the F2-score from 0.77 to 0.83 for the first dataset, from 0.77 to 0.86 for the second dataset, and from 0.70 to 0.82 for the third dataset. However, adding class balancing, oversampling, and cost-harmonization loss separately to LightGBM may not obtain better results.
引用
收藏
页码:159316 / 159335
页数:20
相关论文
共 50 条
  • [41] The Optimized Anomaly Detection Models Based on an Approach of Dealing with Imbalanced Dataset for Credit Card Fraud Detection
    Zhang, Yan-Feng
    Lu, Hong-Liang
    Lin, Hong-Fan
    Qiao, Xue-Chen
    Zheng, Hao
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [42] Fraud detection model & application for credit card acquiring business based on data mining technology
    Liu, Tiebin
    Liu, Shiping
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 963 - 967
  • [43] Application of support vector machines on credit card fraud detection for new card users
    Chen, Rong-Chang
    Chen, Tung-Shou
    Chen, Lin-Ti
    Huang, Ya-Li
    Lai, Li-June
    Proceedings of the Third International Conference on Information and Management Sciences, 2004, 3 : 406 - 410
  • [44] Scalable Machine Learning Techniques for Highly Imbalanced Credit Card Fraud Detection: A Comparative Study
    Mohammed, Rafiq Ahmed
    Wong, Kok-Wai
    Shiratuddin, Mohd Fairuz
    Wang, Xuequn
    PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2018, 11013 : 237 - 246
  • [45] Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier
    Zareapoor, Masoumeh
    Shamsolmoali, Pourya
    INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 679 - 685
  • [46] Fraud Detection in Credit Card and Application of VAT Clustering Algorithm: A Review
    Abdulrazaq, A. A.
    Abdulrazaq, M. B.
    Umoh, I. J.
    Adedokun, E. A.
    2019 2ND INTERNATIONAL CONFERENCE OF THE IEEE NIGERIA COMPUTER CHAPTER (NIGERIACOMPUTCONF), 2019, : 339 - 345
  • [47] The effect of feature extraction and data sampling on credit card fraud detection
    Zahra Salekshahrezaee
    Joffrey L. Leevy
    Taghi M. Khoshgoftaar
    Journal of Big Data, 10
  • [48] Generating Syntetic Data for Credit Card Fraud Detection Using GANs
    Strelcenia, Emilija
    Prakoonwit, Simant
    2022 INTERNATIONAL CONFERENCE ON COMPUTERS AND ARTIFICIAL INTELLIGENCE TECHNOLOGIES, CAIT, 2022, : 42 - 47
  • [49] Credit card fraud detection using ensemble data mining methods
    Saeid Bakhtiari
    Zahra Nasiri
    Javad Vahidi
    Multimedia Tools and Applications, 2023, 82 : 29057 - 29075
  • [50] Credit card fraud detection using ensemble data mining methods
    Bakhtiari, Saeid
    Nasiri, Zahra
    Vahidi, Javad
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29057 - 29075