Block-distributed Gradient Boosted Trees

被引:3
|
作者
Vasiloudis, Theodore [1 ,4 ]
Cho, Hyunsu [2 ]
Bostrom, Henrik [3 ]
机构
[1] RISE AI, Stockholm, Sweden
[2] Amazon Web Serv, Seattle, WA USA
[3] KTH Royal Inst Technol, Stockholm, Sweden
[4] Amazon, Seattle, WA USA
关键词
Gradient Boosted Trees; Distributed Systems; Communication Efficiency; Scalability;
D O I
10.1145/3331184.3331331
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication.
引用
收藏
页码:1025 / 1028
页数:4
相关论文
共 50 条
  • [21] Estimation of inorganic crystal densities using gradient boosted trees
    Zhao, Jesse
    FRONTIERS IN MATERIALS, 2022, 9
  • [22] Estimation of the masses in the local group by gradient boosted decision trees
    Carlesi, Edoardo
    Hoffman, Yehuda
    Libeskind, Noam, I
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2022, 513 (02) : 2385 - 2393
  • [23] Waist circumference prediction for epidemiological research using gradient boosted trees
    Zhou, Weihong
    Eckler, Spencer
    Barszczyk, Andrew
    Waese-Perlman, Alex
    Wang, Yingjie
    Gu, Xiaoping
    Feng, Zhong-Ping
    Peng, Yuzhu
    Lee, Kang
    BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)
  • [24] Gradient Boosted Trees and Denoising Autoencoder to Correct Numerical Wave Forecasts
    Yanchin, Ivan
    Soares, C. Guedes
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (09)
  • [25] Modeling Tick Populations: An Ecological Test Case for Gradient Boosted Trees
    Manley, William
    Tran, Tam
    Prusinski, Melissa
    Brisson, Dustin
    PEER COMMUNITY JOURNAL, 2023, 3
  • [26] Formation lithology classification using scalable gradient boosted decision trees
    Dev, Vikrant A.
    Eden, Mario R.
    COMPUTERS & CHEMICAL ENGINEERING, 2019, 128 : 392 - 404
  • [27] Wind Ramp Event Prediction with Parallelized Gradient Boosted Regression Trees
    Gupta, Saurav
    Shrivastava, Nitin Anand
    Khosravi, Abbas
    Panigrahi, Bijaya Ketan
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 5296 - 5301
  • [28] Gradient boosted decision trees reveal nuances of auditory discrimination behavior
    Griffiths, Carla S.
    Lebert, Jules M.
    Sollini, Joseph
    Bizley, Jennifer K.
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (04)
  • [29] GB-CENT: Gradient Boosted Categorical Embedding and Numerical Trees
    Zhao, Qian
    Shi, Yue
    Hong, Liangjie
    PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 1311 - 1319
  • [30] TF Boosted Trees: A Scalable TensorFlow Based Framework for Gradient Boosting
    Ponomareva, Natalia
    Radpour, Soroush
    Hendry, Gilbert
    Haykal, Salem
    Colthurst, Thomas
    Mitrichev, Petr
    Grushetsky, Alexander
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT III, 2017, 10536 : 423 - 427