Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection

被引:1
|
作者
Zhao, Xiaosong [1 ]
Liu, Yong [1 ]
Zhao, Qiangfu [1 ]
机构
[1] Univ Aizu, Grad Sch, Aizu Wakamatsu, Fukushima 9658580, Japan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Class balancing cost-harmonization LightGBM; cost-sensitive; credit card fraud detection; extremely imbalanced data; interpretability; oversampling; SMOTE; CHALLENGES;
D O I
10.1109/ACCESS.2024.3487212
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Credit card fraud (CCF) is a significant threat to cardholders and financial institutions. CCF detection against this threat is challenging due to extremely imbalanced data (EID). EID involves extremely few instances of fraud for training and an extremely high risk of overlooking fraud. While class balancing or oversampling techniques can address the former problem by punishing negative classes or augmenting the positive data, they do not mitigate the latter. In contrast, the cost-sensitive learning approach targets only the high risk of false negative errors. Therefore, existing approaches are insufficient to solve all the issues of the EID problem. Based on the LightGBM (Light Gradient Boosting Machine) framework, this study introduces two novel machine-learning methods: the class balancing cost-harmonization LightGBM (CB-CHL-LightGBM) and the oversampling cost-harmonization LightGBM (OS-CHL-LightGBM). The new approaches combine class balancing or oversampling technology with LightGBM to solve the EID problem comprehensively. They enhance the efficacy of LightGBM in CCF detection scenarios. Experimental results on three CCF datasets indicate that the two proposed methods outperform LightGBM in several crucial performance metrics. For example, compared with the original LightGBM, CB-CHL-LightGBM or OS-CHL-LightGBM can increase the F2-score from 0.77 to 0.83 for the first dataset, from 0.77 to 0.86 for the second dataset, and from 0.70 to 0.82 for the third dataset. However, adding class balancing, oversampling, and cost-harmonization loss separately to LightGBM may not obtain better results.
引用
收藏
页码:159316 / 159335
页数:20
相关论文
共 50 条
  • [21] Neural data mining for credit card fraud detection
    Guo, Tao
    Li, Gui-Yang
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 3630 - 3634
  • [22] Credit Card Fraud Detection
    Tiwari, Mohit
    Sharma, Vipul
    Bala, Devashish
    Devansh
    Kaushal, Dishant
    JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (02) : 1778 - 1789
  • [23] Distributed data mining in credit card fraud detection
    Chan, PK
    Fan, W
    Prodromidis, AL
    Stolfo, SJ
    IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (06): : 67 - 74
  • [24] Distributed data mining in credit card fraud detection
    Chan, Philip K.
    Fan, Wei
    Prodromidis, Andreas L.
    Stolfo, Salvatore J.
    IEEE Intelligent Systems and Their Applications, 14 (06): : 67 - 74
  • [25] Improving the Data Quality for Credit Card Fraud Detection
    Jing, Rongrong
    Tian, Hu
    Li, Yidi
    Zhang, Xingwei
    Zheng, Xiaolong
    Zhang, Zhu
    Zeng, Daniel
    2020 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2020, : 175 - 180
  • [26] Exploratory Data Analysis for Credit Card Fraud Detection
    Kirar, Jyoti Singh
    Kumar, Dhiraj
    Chatterjee, Diptirtha
    Patel, Prasoon Singh
    Yadav, Shailendra Nath
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021, : 157 - 161
  • [27] Handling Imbalanced Datasets in the Case of Credit Card Fraud
    Ounacer, Soumaya
    Jihal, Houda
    Bayoude, Kenza
    Daif, Abderrahmane
    Azzouazi, Mohamed
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 666 - 678
  • [28] Machine Learning for Prediction of Imbalanced Data: Credit Fraud Detection
    Thanh Cong Tran
    Tran Khanh Dang
    PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [29] NUS: Noisy-Sample-Removed Undersampling Scheme for Imbalanced Classification and Application to Credit Card Fraud Detection
    Zhu, Honghao
    Zhou, MengChu
    Liu, Guanjun
    Xie, Yu
    Liu, Shijun
    Guo, Cheng
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) : 1793 - 1804
  • [30] Using Genetic Algorithm to Improve Classification of Imbalanced Datasets for credit card fraud detection
    Benchaji, Ibtissam
    Douzi, Samira
    El Ouahidi, Bouabid
    2018 2ND CYBER SECURITY IN NETWORKING CONFERENCE (CSNET), 2018,