Block-distributed Gradient Boosted Trees

被引:3
|
作者
Vasiloudis, Theodore [1 ,4 ]
Cho, Hyunsu [2 ]
Bostrom, Henrik [3 ]
机构
[1] RISE AI, Stockholm, Sweden
[2] Amazon Web Serv, Seattle, WA USA
[3] KTH Royal Inst Technol, Stockholm, Sweden
[4] Amazon, Seattle, WA USA
关键词
Gradient Boosted Trees; Distributed Systems; Communication Efficiency; Scalability;
D O I
10.1145/3331184.3331331
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication.
引用
收藏
页码:1025 / 1028
页数:4
相关论文
共 50 条
  • [41] Automated proton track identification in MicroBooNE using gradient boosted decision trees
    Woodruff, Katherine
    18TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH (ACAT2017), 2018, 1085
  • [42] PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
    Chao Fan
    Diwei Liu
    Rui Huang
    Zhigang Chen
    Lei Deng
    BMC Bioinformatics, 17
  • [43] Sector categorization using gradient boosted trees trained on fundamental firm data
    Fang, Ming
    Kuo, Lilian
    Shih, Frank
    Taylor, Stephen
    ALGORITHMIC FINANCE, 2020, 8 (3-4) : 91 - 99
  • [44] An Architecture as an Alternative to Gradient Boosted Decision Trees for Multiple Machine Learning Tasks
    Du, Lei
    Song, Haifeng
    Xu, Yingying
    Dai, Songsong
    ELECTRONICS, 2024, 13 (12)
  • [45] House price prediction with gradient boosted trees under different loss functions
    Hjort, Anders
    Pensar, Johan
    Scheel, Ida
    Sommervoll, Dag Einar
    JOURNAL OF PROPERTY RESEARCH, 2022, 39 (04) : 338 - 364
  • [46] Scalable probabilistic forecasting in retail with gradient boosted trees: A practitioner's approach
    Long, Xueying
    Bui, Quang
    Oktavian, Grady
    Schmidt, Daniel F.
    Bergmeir, Christoph
    Godahewa, Rakshitha
    Lee, Seong Per
    Zhao, Kaifeng
    Condylis, Paul
    INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2025, 279
  • [47] Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning
    Minixhofer, Benjamin
    Gritta, Milan
    Iacobacci, Ignacio
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 303 - 313
  • [48] GBDT-MO: Gradient-Boosted Decision Trees for Multiple Outputs
    Zhang, Zhendong
    Jung, Cheolkon
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) : 3156 - 3167
  • [49] Road Crashes Analysis and Prediction using Gradient Boosted and Random Forest Trees
    Elyassami, Sanaa
    Hamid, Yasir
    Habuza, Tetiana
    2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20), 2020, : 520 - 525
  • [50] Cyanotoxin level prediction in a reservoir using gradient boosted regression trees: a case study
    Paulino José García Nieto
    Esperanza García-Gonzalo
    Fernando Sánchez Lasheras
    José Ramón Alonso Fernández
    Cristina Díaz Muñiz
    Francisco Javier de Cos Juez
    Environmental Science and Pollution Research, 2018, 25 : 22658 - 22671