Flexible loss functions for binary classification in gradient-boosted decision trees: An application to credit scoring

被引：10

作者：

Mushava, Jonah ^{[1
]}

Murray, Michael ^{[1
]}

机构：

[1] Univ KwaZulu Natal, Sch Math Stat & Comp Sci, Westville Campus,Private Bag X54001, ZA-4000 Durban, South Africa

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 238卷

关键词：

Class imbalance; Machine learning; Credit scoring; XGBoost; Freddie Mac; BANKRUPTCY PREDICTION; ENSEMBLE; PERFORMANCE; CHALLENGES; MODEL; RISK;

D O I：

10.1016/j.eswa.2023.121876

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces new flexible loss functions for binary classification in Gradient-Boosted Decision Trees (GBDT) that combine Dice-based and cross-entropy-based losses and offer link functions from either a generalized extreme value (GEV) or exponentiated exponential logistic (EEL) distribution. Testing 27 different GBDT models using XGBoost on a Freddie Mac mortgage loan database showed that the choice of the loss function is useful. Specifically, when the class imbalance ratio (IR) is less than 99, using a skewed GEV distribution-based link function in XGBoost enhances discriminatory power and classification accuracy while retaining a simple model structure, which is particularly important in credit scoring applications. In cases where class imbalances are severe, typically between IRs of 99 and 200, we found that an advanced loss function, which is composed of a symmetric hybrid loss function and a link derived from a positively skewed EEL distribution, outperforms other XGBoost variants. Based on our findings, the accuracy improvements of these proposed extensions result in lower misclassification costs, which are especially evident when IR is below 99, which results in higher profitability for the business. Furthermore, the study highlights the transparency associated with GBDT, which is also an integral component of financial applications. Researchers and practitioners can use these insights to create more accurate and discriminative machine learning models, with possible extensions to other GBDT implementations and machine learning techniques that take into account loss functions. The source code for the proposed approach is publicly available at https://github.com/jm-ml/flexible-losses-for-binary-classification-with-GBDT.

引用

页数：16

共 24 条

[21] Early prediction of severe ICANS after standard-of-care CD19 CAR T-cell therapy using gradient-boosted classification trees
Huang, Jennifer Jing
Liang, Emily C.
Albittar, Aya
Portuguese, Andrew Jay
Wuliji, Natalie
Torkelson, Aiko
Kirchmeier, Delaney
Chutnik, Abigail
Pender, Barbara
Shadman, Mazyar
Hirayama, Alexandre V.
Till, Brian G.
Kimble, Erik Lesley
Iovino, Lorenzo
Chapuis, Aude
Otegbeye, Folashade
Cassaday, Ryan Daniel
Milano, Filippo
Maloney, David G.
Gauthier, Jordan
JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
[22] Deep learning meets decision trees: An application of a heterogeneous deep forest approach in credit scoring for online consumer lending
Xia, Yufei
Guo, Xinyi
Li, Yinguo
He, Lingyun
Chen, Xueyuan
JOURNAL OF FORECASTING, 2022, 41 (08) : 1669 - 1690
[23] Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: application to the 2009 storm event in Messina (Sicily, southern Italy)
Lombardo, L.
Cama, M.
Conoscenti, C.
Maerker, M.
Rotigliano, E.
NATURAL HAZARDS, 2015, 79 (03) : 1621 - 1648
[24] Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: application to the 2009 storm event in Messina (Sicily, southern Italy)
L. Lombardo
M. Cama
C. Conoscenti
M. Märker
E. Rotigliano
Natural Hazards, 2015, 79 : 1621 - 1648

← 1 2 3 →