Flexible loss functions for binary classification in gradient-boosted decision trees: An application to credit scoring

被引:10
|
作者
Mushava, Jonah [1 ]
Murray, Michael [1 ]
机构
[1] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Westville Campus,Private Bag X54001, ZA-4000 Durban, South Africa
关键词
Class imbalance; Machine learning; Credit scoring; XGBoost; Freddie Mac; BANKRUPTCY PREDICTION; ENSEMBLE; PERFORMANCE; CHALLENGES; MODEL; RISK;
D O I
10.1016/j.eswa.2023.121876
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces new flexible loss functions for binary classification in Gradient-Boosted Decision Trees (GBDT) that combine Dice-based and cross-entropy-based losses and offer link functions from either a generalized extreme value (GEV) or exponentiated exponential logistic (EEL) distribution. Testing 27 different GBDT models using XGBoost on a Freddie Mac mortgage loan database showed that the choice of the loss function is useful. Specifically, when the class imbalance ratio (IR) is less than 99, using a skewed GEV distribution-based link function in XGBoost enhances discriminatory power and classification accuracy while retaining a simple model structure, which is particularly important in credit scoring applications. In cases where class imbalances are severe, typically between IRs of 99 and 200, we found that an advanced loss function, which is composed of a symmetric hybrid loss function and a link derived from a positively skewed EEL distribution, outperforms other XGBoost variants. Based on our findings, the accuracy improvements of these proposed extensions result in lower misclassification costs, which are especially evident when IR is below 99, which results in higher profitability for the business. Furthermore, the study highlights the transparency associated with GBDT, which is also an integral component of financial applications. Researchers and practitioners can use these insights to create more accurate and discriminative machine learning models, with possible extensions to other GBDT implementations and machine learning techniques that take into account loss functions. The source code for the proposed approach is publicly available at https://github.com/jm-ml/flexible-losses-for-binary-classification-with-GBDT.
引用
收藏
页数:16
相关论文
共 24 条
  • [21] Early prediction of severe ICANS after standard-of-care CD19 CAR T-cell therapy using gradient-boosted classification trees
    Huang, Jennifer Jing
    Liang, Emily C.
    Albittar, Aya
    Portuguese, Andrew Jay
    Wuliji, Natalie
    Torkelson, Aiko
    Kirchmeier, Delaney
    Chutnik, Abigail
    Pender, Barbara
    Shadman, Mazyar
    Hirayama, Alexandre V.
    Till, Brian G.
    Kimble, Erik Lesley
    Iovino, Lorenzo
    Chapuis, Aude
    Otegbeye, Folashade
    Cassaday, Ryan Daniel
    Milano, Filippo
    Maloney, David G.
    Gauthier, Jordan
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
  • [22] Deep learning meets decision trees: An application of a heterogeneous deep forest approach in credit scoring for online consumer lending
    Xia, Yufei
    Guo, Xinyi
    Li, Yinguo
    He, Lingyun
    Chen, Xueyuan
    JOURNAL OF FORECASTING, 2022, 41 (08) : 1669 - 1690
  • [23] Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: application to the 2009 storm event in Messina (Sicily, southern Italy)
    Lombardo, L.
    Cama, M.
    Conoscenti, C.
    Maerker, M.
    Rotigliano, E.
    NATURAL HAZARDS, 2015, 79 (03) : 1621 - 1648
  • [24] Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: application to the 2009 storm event in Messina (Sicily, southern Italy)
    L. Lombardo
    M. Cama
    C. Conoscenti
    M. Märker
    E. Rotigliano
    Natural Hazards, 2015, 79 : 1621 - 1648