Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

被引:1
|
作者
Lenka, Sudhansu R. [1 ,2 ]
Bisoy, Sukant Kishoro [1 ]
Priyadarshini, Rojalina [1 ]
机构
[1] CV Raman Global Univ, Dept CSE, Bhubaneswar, India
[2] Trident Acad Technol, Bhubaneswar, India
关键词
Class imbalanced data; Optimization subset; Feature selection; Ensemble learning; Credit scoring; Resampling; DECISION TREE; RISK; SMOTE; PERFORMANCE; ALGORITHM;
D O I
10.1007/s10115-024-02129-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Credit scoring models are crucial tools for lenders to assess credit risks. Researchers from academia and the financial industry have shown intense interest in these models. However, real credit datasets often have high dimensionality and class imbalance, making it challenging to develop accurate and effective credit scoring models. To address these challenges, a new approach called the Multiple-Optimized Ensemble Learning (MOEL) method has been proposed. In MOEL, a technique called Multiple Diverse Optimized Subsets (MDOS) generates multiple diverse optimized subsets from various weighted random forests. From each subset, more effective and relevant features are selected. Then, a new evaluation measure is applied to each subset to determine the more optimized subsets. These subsets are applied to a novel Mahalanobis-based oversampling (MOS) technique to provide balanced subsets for the base classifier, which lessens the detrimental effects of imbalanced datasets. Finally, a stacking-based ensemble method is applied to the balanced subsets for integration of the base models. The proposed model was evaluated against six high-dimensional imbalanced credit scoring datasets, and it outperformed state-of-the-art methods, exhibiting a mean rank of 1.5 and 1.333 in terms of F1_score and G-mean, respectively.
引用
收藏
页码:5429 / 5457
页数:29
相关论文
共 50 条
  • [1] Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review
    Lenka, Sudhansu R.
    Bisoy, Sukant Kishoro
    Priyadarshini, Rojalina
    Sain, Mangal
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [2] Interpretable machine learning for imbalanced credit scoring datasets
    Chen, Yujia
    Calabrese, Raffaella
    Martin-Barragan, Belen
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 312 (01) : 357 - 372
  • [3] An interpretable decision tree ensemble model for imbalanced credit scoring datasets
    My, Bui T. T.
    Ta, Bao Q.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 10853 - 10864
  • [4] An Incremental Learning Ensemble Method for Imbalanced Credit Scoring
    Tian, Jin
    Liu, Xinye
    Li, Minqiang
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 754 - 759
  • [5] ASE: Anomaly scoring based ensemble learning for highly imbalanced datasets
    Liang, Xiayu
    Gao, Ying
    Xu, Shanrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [6] An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data
    Yu, Hualong
    Ni, Jun
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 657 - 666
  • [7] Optimized ensemble machine learning framework for high dimensional imbalanced bio assays
    Sharma R.
    Hooda N.
    Revue d'Intelligence Artificielle, 2019, 33 (05) : 387 - 392
  • [8] Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
    Gao, Ruize
    Cui, Shaoze
    Wang, Yu
    Xu, Wei
    FINANCIAL INNOVATION, 2025, 11 (01)
  • [10] A GA-BASED FEATURE SELECTION AND ENSEMBLE LEARNING FOR HIGH-DIMENSIONAL DATASETS
    Xia, Pei-Yong
    Ding, Xiang-Qian
    Jiang, Bai-Ning
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 7 - +