Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

被引:1
|
作者
Lenka, Sudhansu R. [1 ,2 ]
Bisoy, Sukant Kishoro [1 ]
Priyadarshini, Rojalina [1 ]
机构
[1] CV Raman Global Univ, Dept CSE, Bhubaneswar, India
[2] Trident Acad Technol, Bhubaneswar, India
关键词
Class imbalanced data; Optimization subset; Feature selection; Ensemble learning; Credit scoring; Resampling; DECISION TREE; RISK; SMOTE; PERFORMANCE; ALGORITHM;
D O I
10.1007/s10115-024-02129-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Credit scoring models are crucial tools for lenders to assess credit risks. Researchers from academia and the financial industry have shown intense interest in these models. However, real credit datasets often have high dimensionality and class imbalance, making it challenging to develop accurate and effective credit scoring models. To address these challenges, a new approach called the Multiple-Optimized Ensemble Learning (MOEL) method has been proposed. In MOEL, a technique called Multiple Diverse Optimized Subsets (MDOS) generates multiple diverse optimized subsets from various weighted random forests. From each subset, more effective and relevant features are selected. Then, a new evaluation measure is applied to each subset to determine the more optimized subsets. These subsets are applied to a novel Mahalanobis-based oversampling (MOS) technique to provide balanced subsets for the base classifier, which lessens the detrimental effects of imbalanced datasets. Finally, a stacking-based ensemble method is applied to the balanced subsets for integration of the base models. The proposed model was evaluated against six high-dimensional imbalanced credit scoring datasets, and it outperformed state-of-the-art methods, exhibiting a mean rank of 1.5 and 1.333 in terms of F1_score and G-mean, respectively.
引用
收藏
页码:5429 / 5457
页数:29
相关论文
共 50 条
  • [31] A Novel Approach of Ensemble Methods Using the Stacked Generalization for High-dimensional Datasets
    Sharma, Suvita Rani
    Singh, Birmohan
    Kaur, Manpreet
    IETE JOURNAL OF RESEARCH, 2023, 69 (10) : 6802 - 6817
  • [32] Ensemble-Enhanced Semi-Supervised Learning With Optimized Graph Construction for High-Dimensional Data
    Li, Guojie
    Yu, Zhiwen
    Yang, Kaixiang
    Chen, C. L. Philip
    Li, Xuelong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (02) : 1103 - 1119
  • [33] Battering Review Spam Through Ensemble Learning in Imbalanced Datasets
    Khurshid, Faisal
    Zhu, Yan
    Hu, Jie
    Ahmad, Muqeet
    Ahmad, Mushtaq
    COMPUTER JOURNAL, 2022, 65 (07): : 1666 - 1678
  • [34] Improved Contraction-Expansion Subspace Ensemble for High-Dimensional Imbalanced Data Classification
    Xu, Yuhong
    Yu, Zhiwen
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (10) : 5194 - 5205
  • [35] Learning-based intrusion detection for high-dimensional imbalanced traffic
    Gu, Yuheng
    Yang, Yu
    Yan, Yu
    Shen, Fang
    Gao, Minna
    COMPUTER COMMUNICATIONS, 2023, 212 : 366 - 376
  • [36] Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data
    Kim, Jongmo
    Kang, Jaewoong
    Sohn, Mye
    KNOWLEDGE-BASED SYSTEMS, 2021, 220
  • [37] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
    Maldonado, Sebastian
    Lopez, Julio
    APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
  • [38] Credit scoring prediction leveraging interpretable ensemble learning
    Liu, Yang
    Huang, Fei
    Ma, Lili
    Zeng, Qingguo
    Shi, Jiale
    JOURNAL OF FORECASTING, 2024, 43 (02) : 286 - 308
  • [39] A Comparative Performance Assessment of Ensemble Learning for Credit Scoring
    Li, Yiheng
    Chen, Weidong
    MATHEMATICS, 2020, 8 (10) : 1 - 19
  • [40] A Relative Evaluation of the Performance of Ensemble Learning in Credit Scoring
    Devi, C. R. Durga
    Chezian, R. Manicka
    2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER APPLICATIONS (ICACA), 2016, : 161 - 165