Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data using Random Forest

被引:0
|
作者
Bhagat, Reshma C. [1 ]
Patil, Sachin S. [1 ]
机构
[1] Rajarambapu Inst Technol, Dept CSE, Islampur Sangli, MS, India
关键词
Data mining; Multi-class Imbalanced data; Oversampling; MapReduce; Machine Learning;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the era of big data, the applications generating tremendous amount of data are becoming the main focus of attention as the wide increment of data generation and storage that has taken place in the last few years. This scenario is challenging for data mining techniques which are not arrogated to the new space and time requirements. In many of the real world applications, classification of imbalanced data-sets is the point of attraction. Most of the classification methods focused on two-class imbalanced problem. So, it is necessary to solve multi-class imbalanced problem, which exist in real-world domains. In the proposed work, we introduced a methodology for classification of multi-class imbalanced data. This methodology consists of two steps: In first step we used Binarization techniques (OVA and OVO) for decomposing original dataset into subsets of binary classes. In second step, the SMOTE algorithm is applied against each subset of imbalanced binary class in order to get balanced data. Finally, to achieve classification goal Random Forest (RF) classifier is used. Specifically, oversampling technique is adapted to big data using MapReduce so that this technique is able to handle as large data-set as needed. An experimental study is carried out to evaluate the performance of proposed method. For experimental analysis, we have used different datasets from UCI repository and the proposed system is implemented on Apache Hadoop and Apache Spark platform. The results obtained shows that proposed method outperforms over other methods.
引用
收藏
页码:403 / 408
页数:6
相关论文
共 50 条
  • [21] Research on the Classification of High Dimensional Imbalanced Data Based on the Optimizational Random Forest Algorithm
    Bo, Su
    PROCEEDINGS OF 2017 9TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA), 2017, : 228 - 231
  • [22] Intrusion detection system combined enhanced random forest with SMOTE algorithm
    Wu, Tao
    Fan, Honghui
    Zhu, Hongjin
    You, Congzhe
    Zhou, Hongyan
    Huang, Xianzhen
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2022, 2022 (01)
  • [23] Intrusion detection system combined enhanced random forest with SMOTE algorithm
    Tao Wu
    Honghui Fan
    Hongjin Zhu
    Congzhe You
    Hongyan Zhou
    Xianzhen Huang
    EURASIP Journal on Advances in Signal Processing, 2022
  • [24] An Improved Random Forest Algorithm for classification in an imbalanced dataset.
    Jose, Christy
    Gopakumar, G.
    2019 URSI ASIA-PACIFIC RADIO SCIENCE CONFERENCE (AP-RASC), 2019,
  • [25] SMOTE-Based Weighted Deep Rotation Forest for the Imbalanced Hyperspectral Data Classification
    Quan, Yinghui
    Zhong, Xian
    Feng, Wei
    Chan, Jonathan Cheung-Wai
    Li, Qiang
    Xing, Mengdao
    REMOTE SENSING, 2021, 13 (03) : 1 - 25
  • [26] Imbalanced Data Classification Based on Improved Random-SMOTE and Feature Standard Deviation
    Zhang, Ying
    Deng, Li
    Wei, Bo
    MATHEMATICS, 2024, 12 (11)
  • [27] A novel overlapping minimization SMOTE algorithm for imbalanced classification
    He, Yulin
    Lu, Xuan
    Fournier-Viger, Philippe
    Huang, Joshua Zhexue
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (09) : 1266 - 1281
  • [28] Classification techniques for Disease detection using Big-data
    Shah, Jaimin
    Patel, Raj
    2019 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2019, : 140 - 145
  • [29] A Density-Based Random Forest for Imbalanced Data Classification
    Dong, Jia
    Qian, Quan
    FUTURE INTERNET, 2022, 14 (03):
  • [30] Random forest algorithm in big data environment
    Liu, Yingchun
    Computer Modelling and New Technologies, 2014, 18 (12): : 147 - 151