Logistic Regression and Random Forest for Effective Imbalanced Classification

被引:12
|
作者
Luo, Hanwu [1 ]
Pan, Xiubao [1 ]
Wang, Qingshun [2 ]
Ye, Shasha [2 ]
Qian, Ying [2 ]
机构
[1] East Inner Mongolia Elect Power Co Ltd, Hohhot, Peoples R China
[2] East China Normal Univ, Dept Comp Sci & Technol, Shanghai, Peoples R China
关键词
imbalanced classification; Random Forest; Logistic Regression; cost-sensitive classification;
D O I
10.1109/COMPSAC.2019.00139
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Nowadays, the application of data mining and machine learning techniques continues to be common in many fields. There are many imbalanced datasets with much less significant samples than unimportance ones in real-life because it is hard to collect representative positive examples. Under these circumstances, the conventional aim of reducing overall classification accuracy and most of the standard machine learning methods may not be suitable for the imbalanced problem. In this work, we compare the performance of random forest and logistic regression on the prediction of an imbalanced dataset. We propose several ways to enhance two models based on cost-sensitive learning to provide more accurate predictions when dealing with imbalanced datasets.
引用
收藏
页码:916 / 917
页数:2
相关论文
共 50 条
  • [31] Comparative Analysis of Gaussian Mixture Model, Logistic Regression and Random Forest for Big Data Classification using Map Reduce
    Singh, Vikas
    Gupta, Rahul. K.
    Sevakula, Rahul K.
    Verma, Nishchal K.
    2016 11TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2016, : 333 - 338
  • [32] Forest Fire Probability Mapping in Eastern Serbia: Logistic Regression versus Random Forest Method
    Milanovic, Slobodan
    Markovic, Nenad
    Pamucar, Dragan
    Gigovic, Ljubomir
    Kostic, Pavle
    Milanovic, Sladjan D.
    FORESTS, 2021, 12 (01): : 1 - 17
  • [33] An Effective Ensemble Method for Multi-class Classification and Regression for Imbalanced Data
    Alam, Tahira
    Ahmed, Chowdhury Farhan
    Zahin, Sabit Anwar
    Khan, Muhammad Asif Hossain
    Islam, Maliha Tashfia
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 59 - 74
  • [34] An Effective Recursive Technique for Multi-Class Classification and Regression for Imbalanced Data
    Alam, Tahira
    Ahmed, Chowdhury Farhan
    Zahin, Sabit Anwar
    Khan, Muhammad Asif Hossain
    Islam, Maliha Tashfia
    IEEE ACCESS, 2019, 7 : 127615 - 127630
  • [35] Random forest versus logistic regression: a large-scale benchmark experiment
    Raphael Couronné
    Philipp Probst
    Anne-Laure Boulesteix
    BMC Bioinformatics, 19
  • [36] An ensemble of ordered logistic regression and random forest for child garment size matching
    Pierola, A.
    Epifanio, I.
    Alemany, S.
    COMPUTERS & INDUSTRIAL ENGINEERING, 2016, 101 : 455 - 465
  • [37] Random forest versus logistic regression: a large-scale benchmark experiment
    Couronne, Raphael
    Probst, Philipp
    Boulesteix, Anne-Laure
    BMC BIOINFORMATICS, 2018, 19
  • [38] Improving Prediction Accuracy for Logistic Regression On Imbalanced Datasets
    Zhang, Hao
    Li, Zhuolin
    Shahriar, Hossain
    Tao, Lixin
    Bhattacharya, Prabir
    Qian, Ying
    2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, : 918 - 919
  • [39] Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression
    Chen, Carla Chia-Ming
    Schwender, Holger
    Keith, Jonathan
    Nunkesser, Robin
    Mengersen, Kerrie
    Macrossan, Paula
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (06) : 1580 - 1591
  • [40] A modification of logistic regression with imbalanced data: F-measure-oriented Lasso-logistic regression
    My, Bui T. T.
    Ta, Bao Q.
    SCIENCEASIA, 2023, 49 : 68 - 77