Subagging for credit scoring models

被引：135

作者：

Paleologo, Giuseppe ^{[2
]}

Elisseeff, Andre ^{[1
]}

Antonini, Gianluca ^{[1
]}

机构：

[1] IBM Res GmbH, Zurich Res Lab, CH-8803 Ruschlikon, Switzerland

[2] IBM Global Financing Serv, Armonk, NY USA

来源：

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH | 2010年 / 201卷 / 02期

关键词：

Risk analysis; Credit scoring; Classification; Decision Support Systems;

D O I：

10.1016/j.ejor.2009.03.008

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

The logistic regression framework has been for long time the most used statistical method when assessing customer credit risk. Recently, a more pragmatic approach has been adopted, where the first issue is credit risk prediction, instead of explanation. In this context, several classification techniques have been shown to perform well on credit scoring, such as support vector machines among others. While the investigation of better classifiers is an important research topic, the specific methodology chosen in real world applications has to deal with the challenges arising from the real world data collected in the industry. Such data are often highly unbalanced, part of the information can be missing and some common hypotheses, such as the i.i.d. one. can be violated. In this paper we present a case study based on a sample of IBM Italian customers, which presents all the challenges mentioned above. The main objective is to build and validate robust models, able to handle missing information, class unbalancedness and non-iid data points. We define a missing data imputation method and propose the use of an ensemble classification technique, subagging, particularly suitable for highly unbalanced data, such as credit scoring data. Both the imputation and subagging steps are embedded in a customized cross-validation loop, which handles dependencies between different credit requests. The methodology has been applied using several classifiers (kernel support vector machines, nearest neighbors, decision trees, Adaboost) and their subagged versions. The use of subagging improves the performance of the base classifier and we will show that subagging decision trees achieve better performance, still keeping the model simple and reasonably interpretable. (C) 2009 Elsevier B.V. All rights reserved.

引用

页码：490 / 499

页数：10

共 50 条

[41] Using data mining to improve assessment of credit worthiness via credit scoring models
Yap, Bee Wah
Ong, Seng Huat
Husain, Nor Huselina Mohamed
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13274 - 13283
[42] Credit scoring
Crook, JN
Edelman, DE
Thomas, LC
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2005, 56 (09) : 1003 - 1005
[43] Comparative performance of credit scoring models using aggregated predictors
Caiazza, S
Borra, S
DATA MINING III, 2002, 6 : 747 - 756
[44] Scoring Models and Credit Risk: The Case of Cooperative Banks in Poland
Kil, Krzysztof
Ciukaj, Radoslaw
Chrzanowska, Justyna
RISKS, 2021, 9 (07)
[45] Credit Scoring Models Using Soft Computing Methods: A Survey
Lahsasna, Adel
Ainon, Raja Noor
Teh, Ying Wah
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2010, 7 (02) : 115 - 123
[46] A Fourier Spectral Pattern Analysis to Design Credit Scoring Models
Saia, Roberto
Carta, Salvatore
PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND MACHINE LEARNING (IML'17), 2017,
[47] Mixture cure models in credit scoring: If and when borrowers default
Tong, Edward N. C.
Mues, Christophe
Thomas, Lyn C.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 218 (01) : 132 - 139
[48] Retail Exposures Credit Scoring Models for Chinese Commercial Banks
Yang, Yihan
Nie, Guangli
Zhang, Lingling
COMPUTATIONAL SCIENCE - ICCS 2009, 2009, 5545 : 633 - +
[49] Comparison of the hybrid Credit scoring models based on Various Classifiers
Chen, Fei-Long
Li, Feng-Chia
INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2010, 6 (03) : 56 - 74
[50] Transparency, auditability, and explainability of machine learning models in credit scoring
Buecker, Michael
Szepannek, Gero
Gosiewska, Alicja
Biecek, Przemyslaw
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2022, 73 (01) : 70 - 90

← 1 2 3 4 5 →