A Study on Classifying Imbalanced Datasets

被引:0
|
作者
Lakshmi, T. Jaya [1 ]
Prasad, Ch. Siva Rama [2 ]
机构
[1] Vasireddy Venkatadri Inst Technol, Guntur, India
[2] NRI Inst Technol, Guntur, India
关键词
DATA-SETS; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many problems in the real world are, in general modeled as binary classification problems and often one class samples outnumber other class samples. This imbalance causes the reduction in accuracy of prediction in minority class samples but give overall high accuracy. Ignoring misclassification rate of minority class causes severe problems in many cases such as fraudulent credit card transactions, medical diagnosis and e-mail foldering. Many classification algorithms existing in literature are designed for balanced datasets and these algorithms treat majority and minority class samples equal. In this study, the existing solutions for class imbalance problem and common evaluation techniques used for class imbalance are reviewed. The solutions were applied on three real world datasets. It is observed that a combination of SMOTE and Bagging with Random Forest produced the best overall accuracy of minority class.
引用
收藏
页码:141 / 145
页数:5
相关论文
共 50 条
  • [21] Discrimination Aware Classification for Imbalanced Datasets
    Ristanoski, Goce
    Liu, Wei
    Bailey, James
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1529 - 1532
  • [22] A Study on Machine Learning for Imbalanced Datasets with Answer Validation of Question Answering
    Day, Min-Yuh
    Tsai, Cheng-Chia
    PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, : 513 - 519
  • [23] LoRAS: an oversampling approach for imbalanced datasets
    Saptarshi Bej
    Narek Davtyan
    Markus Wolfien
    Mariam Nassar
    Olaf Wolkenhauer
    Machine Learning, 2021, 110 : 279 - 301
  • [24] Epileptic Seizure Prediction for Imbalanced Datasets
    Cosgun, Ercan
    Celebi, Anil
    Gullu, M. Kemal
    2019 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2019, : 290 - 293
  • [25] A Practical Anonymization Approach for Imbalanced Datasets
    Majeed, Abdul
    Hwang, Seong Oun
    IT PROFESSIONAL, 2022, 24 (01) : 63 - 69
  • [26] A new evaluation measure for imbalanced datasets
    School of Information Technologies, J12, University of Sydney, Sydney, NSW, 2006, Australia
    Conferences in Research and Practice in Information Technology Series, 2008, 87 : 27 - 32
  • [27] A Hybrid Approach Handling Imbalanced Datasets
    Soda, Paolo
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2009, PROCEEDINGS, 2009, 5716 : 209 - 218
  • [28] KNNOR: An oversampling technique for imbalanced datasets
    Islam, Ashhadul
    Belhaouari, Samir Brahim
    Rehman, Atiq Ur
    Bensmail, Halima
    APPLIED SOFT COMPUTING, 2022, 115
  • [29] LoRAS: an oversampling approach for imbalanced datasets
    Bej, Saptarshi
    Davtyan, Narek
    Wolfien, Markus
    Nassar, Mariam
    Wolkenhauer, Olaf
    MACHINE LEARNING, 2021, 110 (02) : 279 - 301
  • [30] Customer churn prediction in imbalanced datasets with resampling methods: A comparative study
    Haddadi, Seyed Jamal
    Farshidvard, Aida
    Silva, Fillipe dos Santos
    dos Reis, Julio Cesar
    Reis, Marcelo da Silva
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 246