The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform

被引:0
|
作者
Li, Chong [1 ]
机构
[1] Chongqing Vocat Inst Engn, Informat Engn Sch, Chongqing 402260, Peoples R China
关键词
MAPREDUCE;
D O I
10.3303/CET1651065
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the small sample size, over-learning, nonlinearity, the curse of dimensionality and local minima, and it has a good application prospect in the field of high-dimensional data classification. In order to improve the classification accuracy and computational efficiency, a neval classification method based on the Hadoop cloud computing platform is proposed. Firstly, the processing of Bagging algorithm is done with the data sets to get the different data subsets. Secondly, the Random Forest is completed by training of the decision tree under the MapReuce architecture. Finally, the processing of data sets classification is done by the Random Forest. In our experiment, the three high-dimensional data sets are used as the subjects. The experimental results show that the classification accuracy of proposed method is higher than that of stand-alone Random Forest, and the computational efficiency is improved significantly.
引用
收藏
页码:385 / 390
页数:6
相关论文
共 50 条
  • [41] Random forests for high-dimensional longitudinal data
    Capitaine, Louis
    Genuer, Robin
    Thiebaut, Rodolphe
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (01) : 166 - 184
  • [42] Fuzzy Forests: Extending Random Forest Feature Selection for Correlated, High-Dimensional Data
    Conn, Daniel
    Ngun, Tuck
    Li, Gang
    Ramirez, Christina M.
    JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (09):
  • [43] BayesRandomForest: An R implementation of Bayesian Random Forest for Regression Analysis of High-dimensional Data
    Olaniran, Oyebayo Ridwan
    Bin Abdullah, Mohd Asrul Affendi
    ROMANIAN STATISTICAL REVIEW, 2018, (01) : 95 - 102
  • [44] Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors
    Pentti Kanerva
    Cognitive Computation, 2009, 1 : 139 - 159
  • [45] Classification methods for high-dimensional genetic data
    Kalina, Jan
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2014, 34 (01) : 10 - 18
  • [46] Online Nonlinear Classification for High-Dimensional Data
    Vanli, N. Denizcan
    Ozkan, Huseyin
    Delibalta, Ibrahim
    Kozat, Suleyman S.
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 685 - 688
  • [47] Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors
    Kanerva, Pentti
    COGNITIVE COMPUTATION, 2009, 1 (02) : 139 - 159
  • [48] CLASSIFICATION OF HIGH-DIMENSIONAL DATA: A RANDOM-MATRIX REGULARIZED DISCRIMINANT ANALYSIS APPROACH
    Ye, Bin
    Liu, Peng
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (03): : 955 - 967
  • [49] Enhanced algorithm for high-dimensional data classification
    Wang, Xiaoming
    Wang, Shitong
    APPLIED SOFT COMPUTING, 2016, 40 : 1 - 9
  • [50] A Compressive Classification Framework for High-Dimensional Data
    Tabassum, Muhammad Naveed
    Ollila, Esa
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2020, 1 : 177 - 186