Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

被引:1
|
作者
Lenka, Sudhansu R. [1 ,2 ]
Bisoy, Sukant Kishoro [1 ]
Priyadarshini, Rojalina [1 ]
机构
[1] CV Raman Global Univ, Dept CSE, Bhubaneswar, India
[2] Trident Acad Technol, Bhubaneswar, India
关键词
Class imbalanced data; Optimization subset; Feature selection; Ensemble learning; Credit scoring; Resampling; DECISION TREE; RISK; SMOTE; PERFORMANCE; ALGORITHM;
D O I
10.1007/s10115-024-02129-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Credit scoring models are crucial tools for lenders to assess credit risks. Researchers from academia and the financial industry have shown intense interest in these models. However, real credit datasets often have high dimensionality and class imbalance, making it challenging to develop accurate and effective credit scoring models. To address these challenges, a new approach called the Multiple-Optimized Ensemble Learning (MOEL) method has been proposed. In MOEL, a technique called Multiple Diverse Optimized Subsets (MDOS) generates multiple diverse optimized subsets from various weighted random forests. From each subset, more effective and relevant features are selected. Then, a new evaluation measure is applied to each subset to determine the more optimized subsets. These subsets are applied to a novel Mahalanobis-based oversampling (MOS) technique to provide balanced subsets for the base classifier, which lessens the detrimental effects of imbalanced datasets. Finally, a stacking-based ensemble method is applied to the balanced subsets for integration of the base models. The proposed model was evaluated against six high-dimensional imbalanced credit scoring datasets, and it outperformed state-of-the-art methods, exhibiting a mean rank of 1.5 and 1.333 in terms of F1_score and G-mean, respectively.
引用
收藏
页码:5429 / 5457
页数:29
相关论文
共 50 条
  • [41] Distributed Ensemble Feature Selection Framework for High-Dimensional and High-Skewed Imbalanced Big Dataset
    Soheili, Majid
    Haeri, Maryam Amir Amir
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [42] Incorporating auxiliary information for improved prediction in high-dimensional datasets: an ensemble of shrinkage approaches
    Boonstra, Philip S.
    Taylor, Jeremy M. G.
    Mukherjee, Bhramar
    BIOSTATISTICS, 2013, 14 (02) : 259 - 272
  • [43] Synthetic Generation of High-Dimensional Datasets
    Albuquerque, Georgia
    Loewe, Thomas
    Magnor, Marcus
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (12) : 2317 - 2324
  • [44] Joining massive high-dimensional datasets
    Kahveci, T
    Lang, CA
    Singh, AK
    19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 265 - 276
  • [45] Architectural optimization and feature learning for high-dimensional time series datasets
    Colgan, Robert E.
    Yan, Jingkai
    Marka, Zsuzsa
    Bartos, Imre
    Marka, Szabolcs
    Wright, John N.
    PHYSICAL REVIEW D, 2023, 107 (02)
  • [46] Cluster validation for high-dimensional datasets
    Kim, M
    Yoo, H
    Ramakrishna, RS
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, PROCEEDINGS, 2004, 3192 : 178 - 187
  • [47] Learning From High-Dimensional Biomedical Datasets: The Issue of Class Imbalance
    Pes, Barbara
    IEEE ACCESS, 2020, 8 : 13527 - 13540
  • [48] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [49] Clustering of imbalanced high-dimensional media data
    Šárka Brodinová
    Maia Zaharieva
    Peter Filzmoser
    Thomas Ortner
    Christian Breiteneder
    Advances in Data Analysis and Classification, 2018, 12 : 261 - 284
  • [50] Clustering of imbalanced high-dimensional media data
    Brodinova, Sarka
    Zaharieva, Maia
    Filzmoser, Peter
    Ortner, Thomas
    Breiteneder, Christian
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (02) : 261 - 284