Flexible loss functions for binary classification in gradient-boosted decision trees: An application to credit scoring

被引:10
|
作者
Mushava, Jonah [1 ]
Murray, Michael [1 ]
机构
[1] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Westville Campus,Private Bag X54001, ZA-4000 Durban, South Africa
关键词
Class imbalance; Machine learning; Credit scoring; XGBoost; Freddie Mac; BANKRUPTCY PREDICTION; ENSEMBLE; PERFORMANCE; CHALLENGES; MODEL; RISK;
D O I
10.1016/j.eswa.2023.121876
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces new flexible loss functions for binary classification in Gradient-Boosted Decision Trees (GBDT) that combine Dice-based and cross-entropy-based losses and offer link functions from either a generalized extreme value (GEV) or exponentiated exponential logistic (EEL) distribution. Testing 27 different GBDT models using XGBoost on a Freddie Mac mortgage loan database showed that the choice of the loss function is useful. Specifically, when the class imbalance ratio (IR) is less than 99, using a skewed GEV distribution-based link function in XGBoost enhances discriminatory power and classification accuracy while retaining a simple model structure, which is particularly important in credit scoring applications. In cases where class imbalances are severe, typically between IRs of 99 and 200, we found that an advanced loss function, which is composed of a symmetric hybrid loss function and a link derived from a positively skewed EEL distribution, outperforms other XGBoost variants. Based on our findings, the accuracy improvements of these proposed extensions result in lower misclassification costs, which are especially evident when IR is below 99, which results in higher profitability for the business. Furthermore, the study highlights the transparency associated with GBDT, which is also an integral component of financial applications. Researchers and practitioners can use these insights to create more accurate and discriminative machine learning models, with possible extensions to other GBDT implementations and machine learning techniques that take into account loss functions. The source code for the proposed approach is publicly available at https://github.com/jm-ml/flexible-losses-for-binary-classification-with-GBDT.
引用
收藏
页数:16
相关论文
共 24 条
  • [1] Adversarial Training of Gradient-Boosted Decision Trees
    Calzavara, Stefano
    Lucchese, Claudio
    Tolomei, Gabriele
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2429 - 2432
  • [2] GBDT-MO: Gradient-Boosted Decision Trees for Multiple Outputs
    Zhang, Zhendong
    Jung, Cheolkon
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) : 3156 - 3167
  • [3] APPLICATION OF DECISION TREES IN CREDIT SCORING
    Kvesic, Ljiljanka
    EKONOMSKI VJESNIK, 2013, 26 (02): : 382 - 391
  • [4] GRADIENT BOOSTED DECISION TREES FOR LITHOLOGY CLASSIFICATION
    Dev, Vikrant A.
    Eden, Mario R.
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON FOUNDATIONS OF COMPUTER-AIDED PROCESS DESIGN, 2019, 47 : 113 - 118
  • [5] Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees
    Brophy, Jonathan
    Hammoudeh, Zayd
    Lowd, Daniel
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [6] Offshore application of landslide susceptibility mapping using gradient-boosted decision trees: a Gulf of Mexico case study
    Dyer, Alec S.
    Mark-Moser, MacKenzie
    Duran, Rodrigo
    Bauer, Jennifer R.
    NATURAL HAZARDS, 2024, 120 (07) : 6223 - 6244
  • [7] Offshore application of landslide susceptibility mapping using gradient-boosted decision trees: a Gulf of Mexico case study
    Alec S. Dyer
    MacKenzie Mark-Moser
    Rodrigo Duran
    Jennifer R. Bauer
    Natural Hazards, 2024, 120 : 6223 - 6244
  • [8] Application of gradient-boosted trees to model the association between anatomical metrics and functional outcomes
    Venkatesan, Sudhir
    Wilkes, Emily
    Pillai, Natasha
    Havilio, Moshe
    Griner, Ray
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 332 - 332
  • [9] Classification and Recognition of Building Appearance Based on Optimized Gradient-Boosted Decision Tree Algorithm
    Hu, Mengting
    Guo, Lingxiang
    Liu, Jing
    Song, Yuxuan
    SENSORS, 2023, 23 (11)
  • [10] Comparison of Decision Tree Classification Methods and Gradient Boosted Trees
    Dikananda, Arif Rinaldi
    Jumini, Sri
    Tarihoran, Nafan
    Christinawati, Santy
    Trimastuti, Wahyu
    Rahim, Robbi
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2022, 11 (01): : 316 - 322