Evaluating the Performance of Data Level Methods Using KEEL Tool to Address Class Imbalance Problem

被引:0
|
作者
Kamlesh Upadhyay
Prabhjot Kaur
Deepak Kumar Verma
机构
[1] Lingayas Vidyapeeth,Department of Information Technology
[2] Maharaja Surajmal Institute of Technology,undefined
[3] Lingayas Vidyapeeth,undefined
关键词
Algorithm level approaches; Binary classification; Class imbalance problem; Data level approaches; Ensembled approach;
D O I
暂无
中图分类号
学科分类号
摘要
The class imbalance problem (CIP) has become a hot topic of machine learning in recent years because of its increasing importance in today’s era. As the application area of technology is increases, the size and variety of data also increases. By nature, most of the real-world raw data is present in imbalanced form like credit card frauds, fraudulent telephone calls, shuttle system failure, text classification, nuclear explosions, oil spill detection, detection of brain tumor images etc. The classification algorithms are not able to classify imbalance data accurately and their results always deviate toward the bigger class. This problem is known as Class Imbalance Problem. This paper assess various data level methods which are used to balance the data before classification. It also discusses various characteristics of data which impact class imbalance problem and the reasons why traditional classification algorithms are not able to tackle this issue. Apart from this it also discusses about other data abnormalities which makes the CIP more critical like size of data, overlapping classes, presence of noise in the data, data distribution within each class etc. The paper empirically compared 20 data-level classification methods with 44 UCI real imbalanced data-sets with the imbalance ratio ranging from as low as to 1.82 to as high as to 129.44 using KEEL tool. The performance of the methods is assessed using AUC, F-measure, G-mean metrics and the results are analyzed and represented graphically.
引用
收藏
页码:9741 / 9754
页数:13
相关论文
共 50 条
  • [31] Handling Class Imbalance Problem using Oversampling Techniques: A Review
    Gosain, Anjana
    Sardana, Saanchi
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 79 - 85
  • [32] A literature survey on various aspect of class imbalance problem in data mining
    Goswami, Shivani
    Singh, Anil Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (27) : 70025 - 70050
  • [33] Global-and-Local Aware Data Generation for the Class Imbalance Problem
    Wang, Wentao
    Wang, Suhang
    Fan, Wenqi
    Liu, Zitao
    Tang, Jiliang
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 307 - 315
  • [34] Assessing feature selection method performance with class imbalance data
    Matharaarachchi, Surani
    Domaratzki, Mike
    Muthukumarana, Saman
    MACHINE LEARNING WITH APPLICATIONS, 2021, 6
  • [35] Improving Performance Prediction on Education Data with Noise and Class Imbalance
    Radwan, Akram M.
    Cataltepe, Zehra
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2018, 24 (04): : 777 - 784
  • [36] A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data
    Khushi, Matloob
    Shaukat, Kamran
    Alam, Talha Mahboob
    Hameed, Ibrahim A.
    Uddin, Shahadat
    Luo, Suhuai
    Yang, Xiaoyan
    Reyes, Maranatha Consuelo
    IEEE ACCESS, 2021, 9 : 109960 - 109975
  • [37] Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning
    Abdelkhalek, Ahmed
    Mashaly, Maggie
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 10611 - 10644
  • [38] Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning
    Ahmed Abdelkhalek
    Maggie Mashaly
    The Journal of Supercomputing, 2023, 79 : 10611 - 10644
  • [39] Measuring harmfulness of class imbalance by data complexity measures in oversampling methods
    Gosain, Anjana
    Saha, Anju
    Singh, Deepika
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2019, 7 (2-3) : 203 - 230
  • [40] The Effect of Methods Addressing the Class Imbalance Problem on P300 Detection
    Xu, Guoqiang
    Shen, Furao
    Zhao, Jinxi
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,