Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

被引：1

作者：

Lenka, Sudhansu R. ^{[1
,2
]}

Bisoy, Sukant Kishoro ^{[1
]}

Priyadarshini, Rojalina ^{[1
]}

机构：

[1] CV Raman Global Univ, Dept CSE, Bhubaneswar, India

[2] Trident Acad Technol, Bhubaneswar, India

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2024年 / 66卷 / 09期

关键词：

Class imbalanced data; Optimization subset; Feature selection; Ensemble learning; Credit scoring; Resampling; DECISION TREE; RISK; SMOTE; PERFORMANCE; ALGORITHM;

D O I：

10.1007/s10115-024-02129-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Credit scoring models are crucial tools for lenders to assess credit risks. Researchers from academia and the financial industry have shown intense interest in these models. However, real credit datasets often have high dimensionality and class imbalance, making it challenging to develop accurate and effective credit scoring models. To address these challenges, a new approach called the Multiple-Optimized Ensemble Learning (MOEL) method has been proposed. In MOEL, a technique called Multiple Diverse Optimized Subsets (MDOS) generates multiple diverse optimized subsets from various weighted random forests. From each subset, more effective and relevant features are selected. Then, a new evaluation measure is applied to each subset to determine the more optimized subsets. These subsets are applied to a novel Mahalanobis-based oversampling (MOS) technique to provide balanced subsets for the base classifier, which lessens the detrimental effects of imbalanced datasets. Finally, a stacking-based ensemble method is applied to the balanced subsets for integration of the base models. The proposed model was evaluated against six high-dimensional imbalanced credit scoring datasets, and it outperformed state-of-the-art methods, exhibiting a mean rank of 1.5 and 1.333 in terms of F1_score and G-mean, respectively.

引用

页码：5429 / 5457

页数：29

共 50 条

[31] A Novel Approach of Ensemble Methods Using the Stacked Generalization for High-dimensional Datasets
Sharma, Suvita Rani
Singh, Birmohan
Kaur, Manpreet
IETE JOURNAL OF RESEARCH, 2023, 69 (10) : 6802 - 6817
[32] Ensemble-Enhanced Semi-Supervised Learning With Optimized Graph Construction for High-Dimensional Data
Li, Guojie
Yu, Zhiwen
Yang, Kaixiang
Chen, C. L. Philip
Li, Xuelong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (02) : 1103 - 1119
[33] Battering Review Spam Through Ensemble Learning in Imbalanced Datasets
Khurshid, Faisal
Zhu, Yan
Hu, Jie
Ahmad, Muqeet
Ahmad, Mushtaq
COMPUTER JOURNAL, 2022, 65 (07): : 1666 - 1678
[34] Improved Contraction-Expansion Subspace Ensemble for High-Dimensional Imbalanced Data Classification
Xu, Yuhong
Yu, Zhiwen
Chen, C. L. Philip
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (10) : 5194 - 5205
[35] Learning-based intrusion detection for high-dimensional imbalanced traffic
Gu, Yuheng
Yang, Yu
Yan, Yu
Shen, Fang
Gao, Minna
COMPUTER COMMUNICATIONS, 2023, 212 : 366 - 376
[36] Ensemble learning-based filter-centric hybrid feature selection framework for high-dimensional imbalanced data
Kim, Jongmo
Kang, Jaewoong
Sohn, Mye
KNOWLEDGE-BASED SYSTEMS, 2021, 220
[37] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
Maldonado, Sebastian
Lopez, Julio
APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
[38] Credit scoring prediction leveraging interpretable ensemble learning
Liu, Yang
Huang, Fei
Ma, Lili
Zeng, Qingguo
Shi, Jiale
JOURNAL OF FORECASTING, 2024, 43 (02) : 286 - 308
[39] A Comparative Performance Assessment of Ensemble Learning for Credit Scoring
Li, Yiheng
Chen, Weidong
MATHEMATICS, 2020, 8 (10) : 1 - 19
[40] A Relative Evaluation of the Performance of Ensemble Learning in Credit Scoring
Devi, C. R. Durga
Chezian, R. Manicka
2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER APPLICATIONS (ICACA), 2016, : 161 - 165

← 1 2 3 4 5 →