Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

被引：1

作者：

Lenka, Sudhansu R. ^{[1
,2
]}

Bisoy, Sukant Kishoro ^{[1
]}

Priyadarshini, Rojalina ^{[1
]}

机构：

[1] CV Raman Global Univ, Dept CSE, Bhubaneswar, India

[2] Trident Acad Technol, Bhubaneswar, India

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2024年 / 66卷 / 09期

关键词：

Class imbalanced data; Optimization subset; Feature selection; Ensemble learning; Credit scoring; Resampling; DECISION TREE; RISK; SMOTE; PERFORMANCE; ALGORITHM;

D O I：

10.1007/s10115-024-02129-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Credit scoring models are crucial tools for lenders to assess credit risks. Researchers from academia and the financial industry have shown intense interest in these models. However, real credit datasets often have high dimensionality and class imbalance, making it challenging to develop accurate and effective credit scoring models. To address these challenges, a new approach called the Multiple-Optimized Ensemble Learning (MOEL) method has been proposed. In MOEL, a technique called Multiple Diverse Optimized Subsets (MDOS) generates multiple diverse optimized subsets from various weighted random forests. From each subset, more effective and relevant features are selected. Then, a new evaluation measure is applied to each subset to determine the more optimized subsets. These subsets are applied to a novel Mahalanobis-based oversampling (MOS) technique to provide balanced subsets for the base classifier, which lessens the detrimental effects of imbalanced datasets. Finally, a stacking-based ensemble method is applied to the balanced subsets for integration of the base models. The proposed model was evaluated against six high-dimensional imbalanced credit scoring datasets, and it outperformed state-of-the-art methods, exhibiting a mean rank of 1.5 and 1.333 in terms of F1_score and G-mean, respectively.

引用

页码：5429 / 5457

页数：29

共 50 条

[1] Empirical Analysis of Ensemble Learning for Imbalanced Credit Scoring Datasets: A Systematic Review
Lenka, Sudhansu R.
Bisoy, Sukant Kishoro
Priyadarshini, Rojalina
Sain, Mangal
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[2] Interpretable machine learning for imbalanced credit scoring datasets
Chen, Yujia
Calabrese, Raffaella
Martin-Barragan, Belen
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 312 (01) : 357 - 372
[3] An interpretable decision tree ensemble model for imbalanced credit scoring datasets
My, Bui T. T.
Ta, Bao Q.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 10853 - 10864
[4] An Incremental Learning Ensemble Method for Imbalanced Credit Scoring
Tian, Jin
Liu, Xinye
Li, Minqiang
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 754 - 759
[5] ASE: Anomaly scoring based ensemble learning for highly imbalanced datasets
Liang, Xiayu
Gao, Ying
Xu, Shanrong
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[6] An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data
Yu, Hualong
Ni, Jun
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 657 - 666
[7] Optimized ensemble machine learning framework for high dimensional imbalanced bio assays
Sharma R.
Hooda N.
Revue d'Intelligence Artificielle, 2019, 33 (05) : 387 - 392
[8] Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework
Gao, Ruize
Cui, Shaoze
Wang, Yu
Xu, Wei
FINANCIAL INNOVATION, 2025, 11 (01)
[9] Learning from High-Dimensional and Class-Imbalanced Datasets Using Random Forests
Pes, Barbara
INFORMATION, 2021, 12 (08)
[10] A GA-BASED FEATURE SELECTION AND ENSEMBLE LEARNING FOR HIGH-DIMENSIONAL DATASETS
Xia, Pei-Yong
Ding, Xiang-Qian
Jiang, Bai-Ning
PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 7 - +

← 1 2 3 4 5 →