The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform

被引：0

作者：

Li, Chong ^{[1
]}

机构：

[1] Chongqing Vocat Inst Engn, Informat Engn Sch, Chongqing 402260, Peoples R China

来源：

3RD INTERNATIONAL CONFERENCE ON APPLIED ENGINEERING | 2016年 / 51卷

关键词：

MAPREDUCE;

D O I：

10.3303/CET1651065

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the small sample size, over-learning, nonlinearity, the curse of dimensionality and local minima, and it has a good application prospect in the field of high-dimensional data classification. In order to improve the classification accuracy and computational efficiency, a neval classification method based on the Hadoop cloud computing platform is proposed. Firstly, the processing of Bagging algorithm is done with the data sets to get the different data subsets. Secondly, the Random Forest is completed by training of the decision tree under the MapReuce architecture. Finally, the processing of data sets classification is done by the Random Forest. In our experiment, the three high-dimensional data sets are used as the subjects. The experimental results show that the classification accuracy of proposed method is higher than that of stand-alone Random Forest, and the computational efficiency is improved significantly.

引用

页码：385 / 390

页数：6

共 50 条

[41] Random forests for high-dimensional longitudinal data
Capitaine, Louis
Genuer, Robin
Thiebaut, Rodolphe
STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (01) : 166 - 184
[42] Fuzzy Forests: Extending Random Forest Feature Selection for Correlated, High-Dimensional Data
Conn, Daniel
Ngun, Tuck
Li, Gang
Ramirez, Christina M.
JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (09):
[43] BayesRandomForest: An R implementation of Bayesian Random Forest for Regression Analysis of High-dimensional Data
Olaniran, Oyebayo Ridwan
Bin Abdullah, Mohd Asrul Affendi
ROMANIAN STATISTICAL REVIEW, 2018, (01) : 95 - 102
[44] Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors
Pentti Kanerva
Cognitive Computation, 2009, 1 : 139 - 159
[45] Classification methods for high-dimensional genetic data
Kalina, Jan
BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2014, 34 (01) : 10 - 18
[46] Online Nonlinear Classification for High-Dimensional Data
Vanli, N. Denizcan
Ozkan, Huseyin
Delibalta, Ibrahim
Kozat, Suleyman S.
2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 685 - 688
[47] Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors
Kanerva, Pentti
COGNITIVE COMPUTATION, 2009, 1 (02) : 139 - 159
[48] CLASSIFICATION OF HIGH-DIMENSIONAL DATA: A RANDOM-MATRIX REGULARIZED DISCRIMINANT ANALYSIS APPROACH
Ye, Bin
Liu, Peng
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (03): : 955 - 967
[49] Enhanced algorithm for high-dimensional data classification
Wang, Xiaoming
Wang, Shitong
APPLIED SOFT COMPUTING, 2016, 40 : 1 - 9
[50] A Compressive Classification Framework for High-Dimensional Data
Tabassum, Muhammad Naveed
Ollila, Esa
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2020, 1 : 177 - 186

← 1 2 3 4 5 →