Block-distributed Gradient Boosted Trees

被引：3

作者：

Vasiloudis, Theodore ^{[1
,4
]}

Cho, Hyunsu ^{[2
]}

Bostrom, Henrik ^{[3
]}

机构：

[1] RISE AI, Stockholm, Sweden

[2] Amazon Web Serv, Seattle, WA USA

[3] KTH Royal Inst Technol, Stockholm, Sweden

[4] Amazon, Seattle, WA USA

来源：

PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19) | 2019年

关键词：

Gradient Boosted Trees; Distributed Systems; Communication Efficiency; Scalability;

D O I：

10.1145/3331184.3331331

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication.

引用

页码：1025 / 1028

页数：4

共 50 条

[31] Waist circumference prediction for epidemiological research using gradient boosted trees
Weihong Zhou
Spencer Eckler
Andrew Barszczyk
Alex Waese-Perlman
Yingjie Wang
Xiaoping Gu
Zhong-Ping Feng
Yuzhu Peng
Kang Lee
BMC Medical Research Methodology, 21
[32] Stochastic gradient boosted distributed decision trees security approach for detecting cyber anomalies and classifying multiclass cyber-attacks
Sekhar, J.C.
Priyanka, R.
Nanda, Ashok Kumar
Josephson, P Joel
Ebinezer, M.J.D.
Devi, T Kalavathi
Computers and Security, 2025, 151
[33] Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees
Brophy, Jonathan
Lowd, Daniel
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[34] PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
Fan, Chao
Liu, Diwei
Huang, Rui
Chen, Zhigang
Deng, Lei
BMC BIOINFORMATICS, 2016, 17
[35] Verifying the Value and Veracity of eXtreme Gradient Boosted Decision Trees on a Variety of Datasets
Gupta, Aditya
Gusain, Kunal
Popli, Bhavya
2016 11TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2016, : 457 - 462
[36] Inferring Gene Regulatory Networks of Metabolic Enzymes Using Gradient Boosted Trees
Zhang, Yi
Zhang, Xiaofei
Lane, Andrew N.
Fan, Teresa W-M
Liu, Jinze
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (05) : 1528 - 1536
[37] Optimising pin-in-paste technology using gradient boosted decision trees
Martinek, Peter
Krammer, Oliver
SOLDERING & SURFACE MOUNT TECHNOLOGY, 2018, 30 (03) : 164 - 170
[38] Learning to predict soccer results from relational data with gradient boosted trees
Hubacek, Ondrej
Sourek, Gustav
Zelezny, Filip
MACHINE LEARNING, 2019, 108 (01) : 29 - 47
[39] Gradient boosted trees for spatial data and its application to medical imaging data
Iranzad, Reza
Liu, Xiao
Chaovalitwongse, W. Art
Hippe, Daniel
Wang, Shouyi
Han, Jie
Thammasorn, Phawis
Zeng, Jing
Duan, Chunyan
Bowen, Stephen
IISE TRANSACTIONS ON HEALTHCARE SYSTEMS ENGINEERING, 2022, 12 (03) : 165 - 179
[40] Learning to predict soccer results from relational data with gradient boosted trees
Ondřej Hubáček
Gustav Šourek
Filip Železný
Machine Learning, 2019, 108 : 29 - 47

← 1 2 3 4 5 →