Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data using Random Forest

被引:0
|
作者
Bhagat, Reshma C. [1 ]
Patil, Sachin S. [1 ]
机构
[1] Rajarambapu Inst Technol, Dept CSE, Islampur Sangli, MS, India
关键词
Data mining; Multi-class Imbalanced data; Oversampling; MapReduce; Machine Learning;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the era of big data, the applications generating tremendous amount of data are becoming the main focus of attention as the wide increment of data generation and storage that has taken place in the last few years. This scenario is challenging for data mining techniques which are not arrogated to the new space and time requirements. In many of the real world applications, classification of imbalanced data-sets is the point of attraction. Most of the classification methods focused on two-class imbalanced problem. So, it is necessary to solve multi-class imbalanced problem, which exist in real-world domains. In the proposed work, we introduced a methodology for classification of multi-class imbalanced data. This methodology consists of two steps: In first step we used Binarization techniques (OVA and OVO) for decomposing original dataset into subsets of binary classes. In second step, the SMOTE algorithm is applied against each subset of imbalanced binary class in order to get balanced data. Finally, to achieve classification goal Random Forest (RF) classifier is used. Specifically, oversampling technique is adapted to big data using MapReduce so that this technique is able to handle as large data-set as needed. An experimental study is carried out to evaluate the performance of proposed method. For experimental analysis, we have used different datasets from UCI repository and the proposed system is implemented on Apache Hadoop and Apache Spark platform. The results obtained shows that proposed method outperforms over other methods.
引用
收藏
页码:403 / 408
页数:6
相关论文
共 50 条
  • [41] Predicting the Risk of Diabetes in Big Data Electronic Health Records by using Scalable Random Forest Classification Algorithm
    Rallapalli, Sreekanth
    Suryakanthi, T.
    2016 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND ENGINEERING (ICACCE 2016), 2016, : 281 - 284
  • [42] Random forest algorithm for classification of multiwavelength data
    Gao, Dan
    Zhang, Yan-Xia
    Zhao, Yong-Heng
    RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2009, 9 (02) : 220 - 226
  • [43] A novel Random Forest integrated model for imbalanced data classification problem
    Gu, Qinghua
    Tian, Jingni
    Li, Xuexian
    Jiang, Song
    KNOWLEDGE-BASED SYSTEMS, 2022, 250
  • [44] Random forest for big data classification in the internet of things using optimal features
    Lakshmanaprabu, S. K.
    Shankar, K.
    Ilayaraja, M.
    Nasir, Abdul Wahid
    Vijayakumar, V.
    Chilamkurti, Naveen
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (10) : 2609 - 2618
  • [45] Random forest algorithm for classification of multiwavelength data
    Dan Gao1
    2 Graduate University of Chinese Academy of Sciences
    ResearchinAstronomyandAstrophysics, 2009, 9 (02) : 220 - 226
  • [46] Big-Data Clustering with Genetic Algorithm
    Mortezanezhad, Afsaneh
    Daneshifar, Ebrahim
    2019 IEEE 5TH CONFERENCE ON KNOWLEDGE BASED ENGINEERING AND INNOVATION (KBEI 2019), 2019, : 702 - 706
  • [47] Imbalanced data classification based on DB-SLSMOTE and random forest
    Han, Qi
    Yang, Rui
    Wan, Zitong
    Chen, Shaozhi
    Huang, Mengjie
    Wen, Huiqing
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 6271 - 6276
  • [48] Random forest for big data classification in the internet of things using optimal features
    S. K. Lakshmanaprabu
    K. Shankar
    M. Ilayaraja
    Abdul Wahid Nasir
    V. Vijayakumar
    Naveen Chilamkurti
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 2609 - 2618
  • [49] Imbalanced educational data classification: an effective approach with resampling and random forest
    Vo Thi Ngoc Chau
    Nguyen Hua Phung
    PROCEEDINGS OF 2013 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2013, : 135 - 140
  • [50] Online Sequential Classification of Imbalanced Data by Combining Extreme Learning Machine and improved SMOTE Algorithm
    Mao, Wentao
    Wang, Jinwan
    Wang, Liyun
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,