Evaluating the Performance of Data Level Methods Using KEEL Tool to Address Class Imbalance Problem

被引:0
|
作者
Kamlesh Upadhyay
Prabhjot Kaur
Deepak Kumar Verma
机构
[1] Lingayas Vidyapeeth,Department of Information Technology
[2] Maharaja Surajmal Institute of Technology,undefined
[3] Lingayas Vidyapeeth,undefined
关键词
Algorithm level approaches; Binary classification; Class imbalance problem; Data level approaches; Ensembled approach;
D O I
暂无
中图分类号
学科分类号
摘要
The class imbalance problem (CIP) has become a hot topic of machine learning in recent years because of its increasing importance in today’s era. As the application area of technology is increases, the size and variety of data also increases. By nature, most of the real-world raw data is present in imbalanced form like credit card frauds, fraudulent telephone calls, shuttle system failure, text classification, nuclear explosions, oil spill detection, detection of brain tumor images etc. The classification algorithms are not able to classify imbalance data accurately and their results always deviate toward the bigger class. This problem is known as Class Imbalance Problem. This paper assess various data level methods which are used to balance the data before classification. It also discusses various characteristics of data which impact class imbalance problem and the reasons why traditional classification algorithms are not able to tackle this issue. Apart from this it also discusses about other data abnormalities which makes the CIP more critical like size of data, overlapping classes, presence of noise in the data, data distribution within each class etc. The paper empirically compared 20 data-level classification methods with 44 UCI real imbalanced data-sets with the imbalance ratio ranging from as low as to 1.82 to as high as to 129.44 using KEEL tool. The performance of the methods is assessed using AUC, F-measure, G-mean metrics and the results are analyzed and represented graphically.
引用
收藏
页码:9741 / 9754
页数:13
相关论文
共 50 条
  • [41] ACO Resampling: Enhancing the performance of oversampling methods for class imbalance classification
    Li, Min
    Xiong, An
    Wang, Lei
    Deng, Shaobo
    Ye, Jun
    KNOWLEDGE-BASED SYSTEMS, 2020, 196
  • [42] IMPACT OF TERM DEPENDENCY AND CLASS IMBALANCE ON THE PERFORMANCE OF FEATURE RANKING METHODS
    Makrehchi, Masoud
    Kamel, Mohamed S.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2011, 25 (07) : 953 - 983
  • [43] Combating the Small Sample Class Imbalance Problem Using Feature Selection
    Wasikowski, Mike
    Chen, Xue-wen
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) : 1388 - 1400
  • [44] Author identification: Using text sampling to handle the class imbalance problem
    Stamatatos, Efstathios
    INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (02) : 790 - 799
  • [45] Data Mining Approaches to Predict Final Grade by Overcoming Class Imbalance Problem
    Rashu, Raisul Islam
    Haq, Naheena
    Rahman, Rashedur M.
    2014 17TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2014, : 14 - 19
  • [46] Data analysis methods for evaluating lithographic performance
    Ferguson, Richard A.
    Martino, Ronald M.
    Brunner, Timothy A.
    Journal of Vacuum Science & Technology B: Microelectronics Processing and Phenomena, 1997, 15 (06):
  • [47] A Novel Hybrid Sampling Algorithm for Solving Class Imbalance Problem in Big Data
    Ahlawat, Khyati
    Chug, Anuradha
    Singh, Amit Prakash
    ADVANCES IN DATA SCIENCE AND ADAPTIVE ANALYSIS, 2021, 13 (02)
  • [48] EEG data augmentation: towards class imbalance problem in sleep staging tasks
    Fan, Jiahao
    Sun, Chenglu
    Chen, Chen
    Jiang, Xinyu
    Liu, Xiangyu
    Zhao, Xian
    Meng, Long
    Dai, Chenyun
    Chen, Wei
    JOURNAL OF NEURAL ENGINEERING, 2020, 17 (05)
  • [49] Data analysis methods for evaluating lithographic performance
    Ferguson, RA
    Martino, RM
    Brunner, TA
    JOURNAL OF VACUUM SCIENCE & TECHNOLOGY B, 1997, 15 (06): : 2387 - 2393
  • [50] PERFORMANCE OF THE BOOTSTRAP AND JACKKNIFE METHODS IN EVALUATING DATA
    ALIMOV, YI
    SHAEVICH, AB
    JOURNAL OF ANALYTICAL CHEMISTRY, 1993, 48 (05) : 601 - 603