Oversampling Method for Imbalanced Data Using Credible Counterfactual

被引:0
|
作者
Gao, Feng [1 ]
Song, Mei [1 ]
Zhu, Yi [1 ]
机构
[1] School of Computer Science and Technology, Jiangsu Normal University, Jiangsu, Xuzhou,221000, China
关键词
Classification (of information) - Information use - Support vector machines;
D O I
10.3778/j.issn.1002-8331.2211-0413
中图分类号
学科分类号
摘要
A new method for imbalanced data sets on counterfactual is proposed (counterfactual,CF), and further removes the incredibilitycomposite samples, which aims to solve the problem of the traditional sampling method that cannot make full use of the data set information. Its core idea is to synthesize new samples based on the original instance features of the dataset. Compared with the traditional oversampling interpolation method, it can fully mine the boundary decision information in the data, so as to provide more useful information for the classifier and improve the classification performance. A lot of comparative experiments have been carried out on 9 KEEL and UCI unbalanced datasets, 5 different classifiers (SVM, DT, Logistic, RF, AdaBoost) and 4 traditional oversampling methods (SMOTE, B1- SMOTE, B2- SMOTE, ADASYN). The results show that the algorithm has higher AUC value、F1 value and G-mean value, which can effectively solve the class imbalance problem. © 2024 Editorial Department of Scientia Agricultura Sinica. All rights reserved.
引用
收藏
页码:165 / 171
相关论文
共 50 条
  • [31] Boosting imbalanced data learning with Wiener process oversampling
    Li, Qian
    Li, Gang
    Niu, Wenjia
    Cao, Yanan
    Chang, Liang
    Tan, Jianlong
    Guo, Li
    FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (05) : 836 - 851
  • [32] On oversampling imbalanced data with deep conditional generative models
    Fajardo, Val Andrei
    Findlay, David
    Jaiswal, Charu
    Yin, Xinshang
    Houmanfar, Roshanak
    Xie, Honglei
    Liang, Jiaxi
    She, Xichen
    Emerson, D. B.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169 (169)
  • [33] Oversampling boosting for classification of imbalanced software defect data
    Li, Guangling
    Wang, Shihai
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 4149 - 4154
  • [34] An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data
    Lee, Dohyun
    Kim, Kyoungok
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184 (184)
  • [35] An oversampling method for wafer map defect pattern classification considering small and imbalanced data
    Kim, Eun-Su
    Choi, Seung-Hyun
    Lee, Dong-Hee
    Kim, Kwang-Jae
    Bae, Young-Mok
    Oh, Young-Chan
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 162
  • [36] SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data
    Basgall, Maria Jose
    Hasperue, Waldo
    Naiouf, Marcelo
    Fernandez, Alberto
    Herrera, Francisco
    JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2018, 18 (03): : 203 - 209
  • [37] Radial-Based Approach to Imbalanced Data Oversampling
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 318 - 327
  • [38] Adversarial Autoencoders Oversampling Algorithm for Imbalanced Image Data
    Zhi, Weimei
    Chang, Zhi
    Lu, Junhua
    Geng, Zhengqian
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (11): : 4208 - 4218
  • [39] An Improved MAHAKIL Oversampling Method for Imbalanced Dataset Classification
    Zhang, Yong
    Zuo, Tingting
    Fang, Lichao
    Li, Jun
    Xing, Zongyi
    IEEE ACCESS, 2021, 9 : 16030 - 16040
  • [40] Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning
    Zhang, Yuyan
    Li, Xinyu
    Gao, Liang
    Wang, Lihui
    Wen, Long
    JOURNAL OF MANUFACTURING SYSTEMS, 2018, 48 : 34 - 50