Block-distributed Gradient Boosted Trees

被引:3
|
作者
Vasiloudis, Theodore [1 ,4 ]
Cho, Hyunsu [2 ]
Bostrom, Henrik [3 ]
机构
[1] RISE AI, Stockholm, Sweden
[2] Amazon Web Serv, Seattle, WA USA
[3] KTH Royal Inst Technol, Stockholm, Sweden
[4] Amazon, Seattle, WA USA
关键词
Gradient Boosted Trees; Distributed Systems; Communication Efficiency; Scalability;
D O I
10.1145/3331184.3331331
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication.
引用
收藏
页码:1025 / 1028
页数:4
相关论文
共 50 条
  • [1] Gradient Boosted Trees for Corrective Learning
    Oguz, Baris U.
    Shinohara, Russell T.
    Yushkevich, Paul A.
    Oguz, Ipek
    MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2017), 2017, 10541 : 203 - 211
  • [2] Counting People using Gradient Boosted Trees
    Zhou, Bingyin
    Lu, Ming
    Wang, Yonggang
    2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 391 - 395
  • [3] Robust Supply Chains with Gradient Boosted Trees
    Mahato, Pradeep K.
    Narayan, Apurva
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2633 - 2639
  • [4] Gradient boosted trees for evolving data streams
    Nuwan Gunasekara
    Bernhard Pfahringer
    Heitor Gomes
    Albert Bifet
    Machine Learning, 2024, 113 : 3325 - 3352
  • [5] GRADIENT BOOSTED DECISION TREES FOR LITHOLOGY CLASSIFICATION
    Dev, Vikrant A.
    Eden, Mario R.
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON FOUNDATIONS OF COMPUTER-AIDED PROCESS DESIGN, 2019, 47 : 113 - 118
  • [6] Gradient boosted trees for evolving data streams
    Gunasekara, Nuwan
    Pfahringer, Bernhard
    Gomes, Heitor
    Bifet, Albert
    MACHINE LEARNING, 2024, 113 (05) : 3325 - 3352
  • [7] Leaves on trees: identifying halo stars with extreme gradient boosted trees
    Veljanoski, Jovan
    Helmi, Amina
    Breddels, Maarten
    Posti, Lorenzo
    ASTRONOMY & ASTROPHYSICS, 2018, 621
  • [8] Syntax Description Synthesis Using Gradient Boosted Trees
    Astashkin, Arseny
    Chuvilin, Kirill
    PROCEEDINGS OF THE 20TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT 2017), 2017, : 32 - 39
  • [9] Adversarial Training of Gradient-Boosted Decision Trees
    Calzavara, Stefano
    Lucchese, Claudio
    Tolomei, Gabriele
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2429 - 2432
  • [10] Gradient boosted decision trees for combustion chemistry integration
    Yao, S.
    Kronenburg, A.
    Shamooni, A.
    Stein, O. T.
    Zhang, W.
    APPLICATIONS IN ENERGY AND COMBUSTION SCIENCE, 2022, 11