Cost-aware Credit-scoring Framework Based on Resampling and Feature Selection

被引:0
|
作者
Mou, Yunhan [1 ]
Pu, Zihao [2 ]
Feng, Duanyu [3 ]
Luo, Yingting [3 ]
Lai, Yanzhao [4 ]
Huang, Jimin [5 ]
Tian, Youjing [6 ]
Xiao, Fang [6 ]
机构
[1] Yale Sch Publ Hlth, Dept Biostat, New Haven, CT USA
[2] Univ Hong Kong, Dept Stat & Actuarial Sci, Hong Kong, Peoples R China
[3] Sichuan Univ, Coll Math, Chengdu, Peoples R China
[4] Southwest Jiaotong Univ, Sch Econ & Management, Chengdu, Peoples R China
[5] Chancefocus Asset Management Shanghai Co, Shanghai, Peoples R China
[6] Sichuan Jinding Fortune Informat Technol Co Ltd, Chengdu, Peoples R China
关键词
Credit scoring; Pre-learning resampling; Financial indicators; Feature selection; CLASSIFICATION ALGORITHMS; RISK-ASSESSMENT; MODEL; MACHINE;
D O I
10.1007/s10614-024-10808-w
中图分类号
F [经济];
学科分类号
02 ;
摘要
Credit loans are fundamental to the financial industry, and effectively managing their risks is essential. Financial companies may face two challenges when performing credit scoring to control such risks. First, datasets are often imbalanced with far more non-default cases than default ones, where oversampling methods are usually applied. Few methods, however, have considered further enhancing the quality of a training dataset by addressing the critical samples that may confuse the final classifiers while maintaining the interpretability of the final model. Second, common model evaluation indicators may not accurately reflect the financial loss associated with incorrect predictions or the costs involved in collecting features. To address these challenges, we propose Cost AwarE CRediT ScorIng Framework Based on ResamplIng and FeaturESelection (CERTIFIES). In this framework, we develop a pre-learning resampling approach that employs multiple machine learning methods as assistant classifiers to detect critical data samples after oversampling. This approach further enhances the overall performance of the chief classifier, logistic regression, without compromising its interpretability. Additionally, during the model evaluation step, we design a cost-aware evaluation indicator that accounts for the actual loss due to incorrect predictions and the cost of collecting various features. This provides an approach to perform feature selection based on financial costs. To demonstrate the effectiveness of the proposed method, we apply it to our credit scoring dataset collected by local financial companies, as well as to two public datasets.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] A framework for cost-based feature selection
    Bolon-Canedo, V.
    Porto-Diaz, I.
    Sanchez-Marono, N.
    Alonso-Betanzos, A.
    PATTERN RECOGNITION, 2014, 47 (07) : 2481 - 2489
  • [22] Data mining feature selection for credit scoring models
    Liu, Y
    Schumann, M
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2005, 56 (09) : 1099 - 1108
  • [23] Combination of feature selection approaches with SVM in credit scoring
    Chen, Fei-Long
    Li, Feng-Chia
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (07) : 4902 - 4909
  • [24] A cost-aware framework for the development of AI models for healthcare applications
    Gabriel Erion
    Joseph D. Janizek
    Carly Hudelson
    Richard B. Utarnachitt
    Andrew M. McCoy
    Michael R. Sayre
    Nathan J. White
    Su-In Lee
    Nature Biomedical Engineering, 2022, 6 : 1384 - 1398
  • [25] Filter- versus wrapper-based feature selection for credit scoring
    Somol, P
    Baesens, B
    Pudil, P
    Vanthienen, J
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 985 - 999
  • [26] Multiple objective metaheuristics for feature selection based on stakeholder requirements in credit scoring
    Simumba, Naomi
    Okami, Suguru
    Kodaka, Akira
    Kohtake, Naohiko
    DECISION SUPPORT SYSTEMS, 2022, 155
  • [27] Variable precision neighborhood rough set based feature selection for credit scoring
    Yao, Ping
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NATURAL COMPUTING, VOL II, 2009, : 63 - 66
  • [28] Rough set and scatter search metaheuristic based feature selection for credit scoring
    Wang, Jue
    Hedar, Abdel-Rahman
    Wang, Shouyang
    Ma, Jian
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (06) : 6123 - 6128
  • [29] A cost-aware framework for the development of AI models for healthcare applications
    Erion, Gabriel
    Janizek, Joseph D.
    Hudelson, Carly
    Utarnachitt, Richard B.
    McCoy, Andrew M.
    Sayre, Michael R.
    White, Nathan J.
    Lee, Su-In
    NATURE BIOMEDICAL ENGINEERING, 2022, 6 (12) : 1384 - 1398
  • [30] Reducing Missingness in a Stream through Cost-Aware Active Feature Acquisition
    Buettner, Maik
    Beyer, Christian
    Spiliopoulou, Myra
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 435 - 444