Accelerating a random forest classifier: multi-core, GP-GPU, or FPGA?

被引:102
|
作者
Van Essen, Brian [1 ]
Macaraeg, Chris [1 ]
Gokhale, Maya [1 ]
Prenger, Ryan [1 ]
机构
[1] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
关键词
FPGA; GP-GPU; OpenMP; Machine learning;
D O I
10.1109/FCCM.2012.47
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Random forest classification is a well known machine learning technique that generates classifiers in the form of an ensemble ("forest") of decision trees. The classification of an input sample is determined by the majority classification by the ensemble. Traditional random forest classifiers can be highly effective, but classification using a random forest is memory bound and not typically suitable for acceleration using FPGAs or GP-GPUs due to the need to traverse large, possibly irregular decision trees. Recent work at Lawrence Livermore National Laboratory has developed several variants of random forest classifiers, including the Compact Random Forest (CRF), that can generate decision trees more suitable for acceleration than traditional decision trees. Our paper compares and contrasts the effectiveness of FPGAs, GP-GPUs, and multi-core CPUs for accelerating classification using models generated by compact random forest machine learning classifiers. Taking advantage of training algorithms that can produce compact random forests composed of many, small trees rather than fewer, deep trees, we are able to regularize the forest such that the classification of any sample takes a deterministic amount of time. This optimization then allows us to execute the classifier in a pipelined or single-instruction multiple thread (SIMT) fashion. We show that FPGAs provide the highest performance solution, but require a multi-chip / multi-board system to execute even modest sized forests. GP-GPUs offer a more flexible solution with reasonably high performance that scales with forest size. Finally, multi-threading via OpenMP on a shared memory system was the simplest solution and provided near linear performance that scaled with core count, but was still significantly slower than the GP-GPU and FPGA.
引用
收藏
页码:232 / 239
页数:8
相关论文
共 50 条
  • [21] Accelerating sequential programs on commodity multi-core processors
    Zhang, Yuanming
    Xiao, Gang
    Baba, Takanobu
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (04) : 2257 - 2265
  • [22] Adaptively accelerating FWM2DA seismic modelling program on multi-core CPU and GPU architectures
    Londhe, Ashutosh
    Rastogi, Richa
    Srivastava, Abhishek
    Khonde, Kiran
    Sirasala, Kirannmayi M.
    Kharche, Komal
    COMPUTERS & GEOSCIENCES, 2021, 146
  • [23] A Profiler for a Heterogeneous Multi-Core Multi-FPGA System
    Nunes, Daniel
    Saldana, Manuel
    Chow, Paul
    PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, 2008, : 113 - +
  • [24] Algorithmic skeletons for multi-core, multi-GPU systems and clusters
    Ernsting, Steffen
    Kuchen, Herbert
    International Journal of High Performance Computing and Networking, 2012, 7 (02) : 129 - 138
  • [25] Hybrid Multi-Core Recurrent Architecture Approbation on FPGA
    Stepchenkov, Yury
    Shikunov, Yury
    Morozov, Nikolai
    Orlov, Georgy
    Khilko, Dmitry
    PROCEEDINGS OF THE 2019 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (EICONRUS), 2019, : 1705 - 1708
  • [26] Application of Multi-core Parallel Computing in FPGA Placement
    Huang, Bohu
    Zhang, Haibin
    2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 884 - 889
  • [27] Multi-Core for K-Means Clustering on FPGA
    Canilho, Jose
    Vestias, Mario
    Neto, Horacio
    2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
  • [28] Multi-Core FPGA Execution for Electromagnetic Simulation by FDTD
    Hayakawa, Kiyoshi
    Yamano, Ryusuke
    2015 2ND INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING ICISCE 2015, 2015, : 831 - 835
  • [29] Acceleration of Stereo-Matching on Multi-core CPU and GPU
    Xu, Tian
    Cockshott, Paul
    Oehler, Susanne
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 108 - 115
  • [30] The Research of SAR Processing Performance Based on Multi-core GPU
    Wang, Yuwei
    Li, Xingming
    Hu, Shanqing
    Yu, Jiacheng
    SIGNAL AND INFORMATION PROCESSING, NETWORKING AND COMPUTERS, 2018, 473 : 156 - 163