A method of classifying imbalanced credit data based on the AC-CTGAN hybrid sampling algorithm

被引:0
|
作者
Chen, Tinggui [1 ,2 ]
Gu, Hailian [2 ]
Yang, Zhiyu [3 ]
Yang, Jianjun [4 ]
Wang, Bing [5 ]
机构
[1] Wuhan Univ Sci & Technol, Hubei Key Lab Mech Transmiss & Mfg Engn, Wuhan 430081, Peoples R China
[2] Zhejiang Gongshang Univ, Sch Stat & Math, 18 Xuezheng St, Hangzhou 314423, Zhejiang, Peoples R China
[3] Beijing Branch China Bohai Bank Co Ltd, 218 Haihe East Rd, Beijing, Peoples R China
[4] Univ North Georgia, Dept Comp Sci & Informat Syst, 3820 Mundy Mill Rd, Oakwood, GA 30566 USA
[5] Zhejiang Gongshang Univ, Hangzhou Coll Commerce, Sch Artificial Intelligence & Elect Commerce, Hangzhou, Peoples R China
来源
JOURNAL OF CREDIT RISK | 2024年 / 20卷 / 03期
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
credit risk identification; conditional tabular generative adversarial networks (CT GAN); adaptive clustering; hybrid sampling; within-imbalance issues; SMOTE;
D O I
10.21314/JCR.2024.007
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
The rapid growth of consumer credit services has heightened financial institutions' need for enhanced risk management capabilities, as they strive to satisfy individuals' various consumption preferences. Identifying personal credit risk is crucial in financial risk management, underscoring the importance of financial institutions developing a systematic and effective credit risk identification framework to mitigate the likelihood of credit defaults. To address the class imbalance of credit data, this paper starts at the data level and proposes the method of adaptive cluster mixed sampling based on conditional tabular generative adversarial networks (AC-CTGAN). The method first uses the edited nearest neighbors algorithm (ENN) for preliminary denoising of the original credit data, then employs the improved K-means algorithm to obtain multiple subclusters of the minority samples. The local density of each sub- cluster is calculated, and the oversampling weight of each subcluster is adaptively determined on the basis of the size of the local density. Finally, minority samples are generated via the CTGAN, and the decision boundary is clarified via the TomekLink algorithm. Comparative experimental results show that the minority class samples generated by the AC-CTGAN algorithm can realistically reflect the distribution of the original data, minimize the appearance of class-overlapping and limit the introduction of new noisy data, which increases sample diversity. The potential within- class imbalance of credit data is also somewhat alleviated. The risk-identification models trained on credit data processed by the AC-CTGAN algorithm have a greater generalization ability compared with the synthetic minority oversampling technique (SMOTE), SMOTE variants and the CTGAN.
引用
收藏
页数:116
相关论文
共 50 条
  • [1] Imbalanced Disk Failure Data Processing Method Based on CTGAN
    Jia, Jingbo
    Wu, Peng
    Zhang, Kai
    Zhong, Ji
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 638 - 649
  • [2] CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction
    Adiputra, I. Nyoman Mahayasa
    Wanchai, Paweena
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [3] A Hybrid Sampling Method for Imbalanced Data
    Gazzah, Sami
    Hechkel, Amina
    Ben Amara, Najoua Essoukri
    2015 IEEE 12TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2015,
  • [4] A weighted hybrid ensemble method for classifying imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Chen, Si
    Zhang, Ruifeng
    Yu, Bilin
    Liu, Qingfang
    KNOWLEDGE-BASED SYSTEMS, 2020, 203
  • [5] A weighted hybrid ensemble method for classifying imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Chen, Si
    Zhang, Ruifeng
    Yu, Bilin
    Liu, Qingfang
    Knowledge-Based Systems, 2020, 203
  • [6] A new sampling method for classifying imbalanced data based on support vector machine ensemble
    Jian, Chuanxia
    Gao, Jian
    Ao, Yinhui
    NEUROCOMPUTING, 2016, 193 : 115 - 122
  • [7] HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
    Chen, Liping
    Jiang, Jiabao
    Zhang, Yong
    COMPLEXITY, 2021, 2021
  • [8] A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data
    Salehi, Amir Reza
    Khedmati, Majid
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [9] A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data
    Amir Reza Salehi
    Majid Khedmati
    Scientific Reports, 14
  • [10] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207