Cost-aware Credit-scoring Framework Based on Resampling and Feature Selection

被引:0
|
作者
Mou, Yunhan [1 ]
Pu, Zihao [2 ]
Feng, Duanyu [3 ]
Luo, Yingting [3 ]
Lai, Yanzhao [4 ]
Huang, Jimin [5 ]
Tian, Youjing [6 ]
Xiao, Fang [6 ]
机构
[1] Yale Sch Publ Hlth, Dept Biostat, New Haven, CT USA
[2] Univ Hong Kong, Dept Stat & Actuarial Sci, Hong Kong, Peoples R China
[3] Sichuan Univ, Coll Math, Chengdu, Peoples R China
[4] Southwest Jiaotong Univ, Sch Econ & Management, Chengdu, Peoples R China
[5] Chancefocus Asset Management Shanghai Co, Shanghai, Peoples R China
[6] Sichuan Jinding Fortune Informat Technol Co Ltd, Chengdu, Peoples R China
关键词
Credit scoring; Pre-learning resampling; Financial indicators; Feature selection; CLASSIFICATION ALGORITHMS; RISK-ASSESSMENT; MODEL; MACHINE;
D O I
10.1007/s10614-024-10808-w
中图分类号
F [经济];
学科分类号
02 ;
摘要
Credit loans are fundamental to the financial industry, and effectively managing their risks is essential. Financial companies may face two challenges when performing credit scoring to control such risks. First, datasets are often imbalanced with far more non-default cases than default ones, where oversampling methods are usually applied. Few methods, however, have considered further enhancing the quality of a training dataset by addressing the critical samples that may confuse the final classifiers while maintaining the interpretability of the final model. Second, common model evaluation indicators may not accurately reflect the financial loss associated with incorrect predictions or the costs involved in collecting features. To address these challenges, we propose Cost AwarE CRediT ScorIng Framework Based on ResamplIng and FeaturESelection (CERTIFIES). In this framework, we develop a pre-learning resampling approach that employs multiple machine learning methods as assistant classifiers to detect critical data samples after oversampling. This approach further enhances the overall performance of the chief classifier, logistic regression, without compromising its interpretability. Additionally, during the model evaluation step, we design a cost-aware evaluation indicator that accounts for the actual loss due to incorrect predictions and the cost of collecting various features. This provides an approach to perform feature selection based on financial costs. To demonstrate the effectiveness of the proposed method, we apply it to our credit scoring dataset collected by local financial companies, as well as to two public datasets.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Application of the VNS heuristic for feature selection in credit scoring problems
    Helder, Victor Gomes
    Filomena, Tiago Pascoal
    Ferreira, Luciano
    Kirch, Guilherme
    Machine Learning with Applications, 2022, 9
  • [42] A Fully Automated and Configurable Cost-Aware Framework for Adaptive Functional Diagnosis
    Bolchini, Cristiana
    Cassano, Luca
    IEEE DESIGN & TEST, 2017, 34 (02) : 79 - 86
  • [43] Updating a credit-scoring model based on new attributes without realization of actual data
    Ju, Yong Han
    Sohn, So Young
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 234 (01) : 119 - 126
  • [44] A two-stage framework for credit scoring based on feature augmentation and dimension reduction
    Deng, Xuanjie
    Wang, Siyang
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (12) : 6512 - 6524
  • [45] QoS and Cost-Aware Protocol Selection for Next Generation Wireless Network
    Munjal, Meenakshi
    Singh, Niraj Pratap
    JOURNAL OF NETWORK AND SYSTEMS MANAGEMENT, 2019, 27 (02) : 327 - 350
  • [46] An algorithmic framework for synthetic cost-aware decision making in molecular design
    Fromer, Jenna C.
    Coley, Connor W.
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (06): : 440 - 450
  • [47] Cost-aware synthesis of asynchronous circuits based on partial acknowledgement
    Zhou, Yu
    Sokolov, Danil
    Yakovlev, Alex
    IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, DIGEST OF TECHNICAL PAPERS, ICCAD, 2006, : 326 - +
  • [48] A Machine Learning-Based Framework with Enhanced Feature Selection and Resampling for Improved Intrusion Detection
    Malik, Fazila
    Khan, Qazi Waqas
    Rizwan, Atif
    Alnashwan, Rana
    Atteia, Ghada
    MATHEMATICS, 2024, 12 (12)
  • [49] Internet Financial Credit Scoring Models Based on Deep Forest and Resampling Methods
    Zhong, Yu
    Wang, Huiling
    IEEE ACCESS, 2023, 11 : 8689 - 8700
  • [50] DyRAC: Cost-aware Resource Assignment and Provider Selection for Dynamic Cloud Workloads
    Sfakianakis, Yannis
    Marazakis, Manolis
    Bilas, Angelos
    2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, : 502 - 509