Block-distributed Gradient Boosted Trees

被引:3
|
作者
Vasiloudis, Theodore [1 ,4 ]
Cho, Hyunsu [2 ]
Bostrom, Henrik [3 ]
机构
[1] RISE AI, Stockholm, Sweden
[2] Amazon Web Serv, Seattle, WA USA
[3] KTH Royal Inst Technol, Stockholm, Sweden
[4] Amazon, Seattle, WA USA
关键词
Gradient Boosted Trees; Distributed Systems; Communication Efficiency; Scalability;
D O I
10.1145/3331184.3331331
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication.
引用
收藏
页码:1025 / 1028
页数:4
相关论文
共 50 条
  • [31] Waist circumference prediction for epidemiological research using gradient boosted trees
    Weihong Zhou
    Spencer Eckler
    Andrew Barszczyk
    Alex Waese-Perlman
    Yingjie Wang
    Xiaoping Gu
    Zhong-Ping Feng
    Yuzhu Peng
    Kang Lee
    BMC Medical Research Methodology, 21
  • [32] Stochastic gradient boosted distributed decision trees security approach for detecting cyber anomalies and classifying multiclass cyber-attacks
    Sekhar, J.C.
    Priyanka, R.
    Nanda, Ashok Kumar
    Josephson, P Joel
    Ebinezer, M.J.D.
    Devi, T Kalavathi
    Computers and Security, 2025, 151
  • [33] Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees
    Brophy, Jonathan
    Lowd, Daniel
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [34] PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility
    Fan, Chao
    Liu, Diwei
    Huang, Rui
    Chen, Zhigang
    Deng, Lei
    BMC BIOINFORMATICS, 2016, 17
  • [35] Verifying the Value and Veracity of eXtreme Gradient Boosted Decision Trees on a Variety of Datasets
    Gupta, Aditya
    Gusain, Kunal
    Popli, Bhavya
    2016 11TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2016, : 457 - 462
  • [36] Inferring Gene Regulatory Networks of Metabolic Enzymes Using Gradient Boosted Trees
    Zhang, Yi
    Zhang, Xiaofei
    Lane, Andrew N.
    Fan, Teresa W-M
    Liu, Jinze
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (05) : 1528 - 1536
  • [37] Optimising pin-in-paste technology using gradient boosted decision trees
    Martinek, Peter
    Krammer, Oliver
    SOLDERING & SURFACE MOUNT TECHNOLOGY, 2018, 30 (03) : 164 - 170
  • [38] Learning to predict soccer results from relational data with gradient boosted trees
    Hubacek, Ondrej
    Sourek, Gustav
    Zelezny, Filip
    MACHINE LEARNING, 2019, 108 (01) : 29 - 47
  • [39] Gradient boosted trees for spatial data and its application to medical imaging data
    Iranzad, Reza
    Liu, Xiao
    Chaovalitwongse, W. Art
    Hippe, Daniel
    Wang, Shouyi
    Han, Jie
    Thammasorn, Phawis
    Zeng, Jing
    Duan, Chunyan
    Bowen, Stephen
    IISE TRANSACTIONS ON HEALTHCARE SYSTEMS ENGINEERING, 2022, 12 (03) : 165 - 179
  • [40] Learning to predict soccer results from relational data with gradient boosted trees
    Ondřej Hubáček
    Gustav Šourek
    Filip Železný
    Machine Learning, 2019, 108 : 29 - 47