A Study on Classifying Imbalanced Datasets

被引:0
|
作者
Lakshmi, T. Jaya [1 ]
Prasad, Ch. Siva Rama [2 ]
机构
[1] Vasireddy Venkatadri Inst Technol, Guntur, India
[2] NRI Inst Technol, Guntur, India
关键词
DATA-SETS; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many problems in the real world are, in general modeled as binary classification problems and often one class samples outnumber other class samples. This imbalance causes the reduction in accuracy of prediction in minority class samples but give overall high accuracy. Ignoring misclassification rate of minority class causes severe problems in many cases such as fraudulent credit card transactions, medical diagnosis and e-mail foldering. Many classification algorithms existing in literature are designed for balanced datasets and these algorithms treat majority and minority class samples equal. In this study, the existing solutions for class imbalance problem and common evaluation techniques used for class imbalance are reviewed. The solutions were applied on three real world datasets. It is observed that a combination of SMOTE and Bagging with Random Forest produced the best overall accuracy of minority class.
引用
收藏
页码:141 / 145
页数:5
相关论文
共 50 条
  • [1] Fuzzy support vector machine with graph for classifying imbalanced datasets
    Chen, Baihua
    Fan, Yuling
    Lan, Weiyao
    Liu, Jinghua
    Cao, Chao
    Gao, Yunlong
    NEUROCOMPUTING, 2022, 514 : 296 - 312
  • [2] A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
    ThulasiBikku
    Rao, Sambasiva
    Akepogu, Ananda Rao
    INTERNATIONAL CONFERENCE ON MATERIALS, ALLOYS AND EXPERIMENTAL MECHANICS (ICMAEM-2017), 2017, 225
  • [3] Classifying Depression in Imbalanced Datasets using an Autoencoder-Based Anomaly Detection Approach
    Gerych, Walter
    Agu, Emmanuel
    Rundensteiner, Elke
    2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 124 - 127
  • [4] Applying Resampling Methods for Imbalanced Datasets to Not So Imbalanced Datasets
    Arbelaitz, Olatz
    Gurrutxaga, Ibai
    Muguerza, Javier
    Maria Perez, Jesus
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2013, 2013, 8109 : 111 - 120
  • [5] Handling Imbalanced and Overlapped Medical Datasets: A Comparative Study
    Basit, Mohammad Sarosh
    Khan, Adeeba
    Farooq, Omar
    Khan, Yusuf Uzzaman
    Shameem, Mohammad
    2022 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA, SIGNAL PROCESSING AND COMMUNICATION TECHNOLOGIES (IMPACT), 2022,
  • [6] Empirical Study of Associative Classifiers on Imbalanced Datasets in KEEL
    Ali, Zulfiqar
    Ahmad, Rehan
    Akhtar, Muhammad Nadeem
    Chuhan, Zishan Hussain
    Kiran, Hafiza Maria
    Shahzad, Waseem
    2018 9TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS (IISA), 2018, : 112 - 118
  • [7] Classifying Severely Imbalanced Data
    Klement, William
    Wilk, Szymon
    Michalowski, Wojtek
    Matwin, Stan
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 258 - 264
  • [8] A Comparison for Handling Imbalanced Datasets
    Syaripudin, Arif
    Khodra, Masayu Leylia
    2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 293 - 297
  • [9] Active Learning for Imbalanced Datasets
    Aggarwal, Umang
    Popescu, Adrian
    Hudelot, Celine
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1417 - 1426
  • [10] To improve classification of imbalanced datasets
    Shukla, Pratyusha
    Bhowmick, Kiran
    2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,