Accelerating a random forest classifier: multi-core, GP-GPU, or FPGA?

被引：102

作者：

Van Essen, Brian ^{[1
]}

Macaraeg, Chris ^{[1
]}

Gokhale, Maya ^{[1
]}

Prenger, Ryan ^{[1
]}

机构：

[1] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA

来源：

2012 IEEE 20TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM) | 2012年

关键词：

FPGA; GP-GPU; OpenMP; Machine learning;

D O I：

10.1109/FCCM.2012.47

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Random forest classification is a well known machine learning technique that generates classifiers in the form of an ensemble ("forest") of decision trees. The classification of an input sample is determined by the majority classification by the ensemble. Traditional random forest classifiers can be highly effective, but classification using a random forest is memory bound and not typically suitable for acceleration using FPGAs or GP-GPUs due to the need to traverse large, possibly irregular decision trees. Recent work at Lawrence Livermore National Laboratory has developed several variants of random forest classifiers, including the Compact Random Forest (CRF), that can generate decision trees more suitable for acceleration than traditional decision trees. Our paper compares and contrasts the effectiveness of FPGAs, GP-GPUs, and multi-core CPUs for accelerating classification using models generated by compact random forest machine learning classifiers. Taking advantage of training algorithms that can produce compact random forests composed of many, small trees rather than fewer, deep trees, we are able to regularize the forest such that the classification of any sample takes a deterministic amount of time. This optimization then allows us to execute the classifier in a pipelined or single-instruction multiple thread (SIMT) fashion. We show that FPGAs provide the highest performance solution, but require a multi-chip / multi-board system to execute even modest sized forests. GP-GPUs offer a more flexible solution with reasonably high performance that scales with forest size. Finally, multi-threading via OpenMP on a shared memory system was the simplest solution and provided near linear performance that scaled with core count, but was still significantly slower than the GP-GPU and FPGA.

引用

页码：232 / 239

页数：8

共 50 条

[21] Accelerating sequential programs on commodity multi-core processors
Zhang, Yuanming
Xiao, Gang
Baba, Takanobu
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (04) : 2257 - 2265
[22] Adaptively accelerating FWM2DA seismic modelling program on multi-core CPU and GPU architectures
Londhe, Ashutosh
Rastogi, Richa
Srivastava, Abhishek
Khonde, Kiran
Sirasala, Kirannmayi M.
Kharche, Komal
COMPUTERS & GEOSCIENCES, 2021, 146
[23] A Profiler for a Heterogeneous Multi-Core Multi-FPGA System
Nunes, Daniel
Saldana, Manuel
Chow, Paul
PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, 2008, : 113 - +
[24] Algorithmic skeletons for multi-core, multi-GPU systems and clusters
Ernsting, Steffen
Kuchen, Herbert
International Journal of High Performance Computing and Networking, 2012, 7 (02) : 129 - 138
[25] Hybrid Multi-Core Recurrent Architecture Approbation on FPGA
Stepchenkov, Yury
Shikunov, Yury
Morozov, Nikolai
Orlov, Georgy
Khilko, Dmitry
PROCEEDINGS OF THE 2019 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (EICONRUS), 2019, : 1705 - 1708
[26] Application of Multi-core Parallel Computing in FPGA Placement
Huang, Bohu
Zhang, Haibin
2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 884 - 889
[27] Multi-Core for K-Means Clustering on FPGA
Canilho, Jose
Vestias, Mario
Neto, Horacio
2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
[28] Multi-Core FPGA Execution for Electromagnetic Simulation by FDTD
Hayakawa, Kiyoshi
Yamano, Ryusuke
2015 2ND INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING ICISCE 2015, 2015, : 831 - 835
[29] Acceleration of Stereo-Matching on Multi-core CPU and GPU
Xu, Tian
Cockshott, Paul
Oehler, Susanne
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 108 - 115
[30] The Research of SAR Processing Performance Based on Multi-core GPU
Wang, Yuwei
Li, Xingming
Hu, Shanqing
Yu, Jiacheng
SIGNAL AND INFORMATION PROCESSING, NETWORKING AND COMPUTERS, 2018, 473 : 156 - 163

← 1 2 3 4 5 →