ML Training with Cloud GPU Shortages: Is Cross-Region the Answer?

被引:2
|
作者
Strati, Foteini [1 ]
Elvinger, Paul [1 ]
Kerimoglu, Tolga [1 ]
Klimovic, Ana [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
Machine Learning; Cloud computing;
D O I
10.1145/3642970.3655843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The widespread adoption of ML has led to a high demand for GPU hardware and consequently, severe shortages of GPUs in the public cloud. Allocating a sufficient number of GPUs to train or fine-tune today's large ML models in a single cloud region is often difficult. Users can get access to more GPUs if they are willing to run a ML training job using devices across different geographical regions. However, GPU nodes are connected with lower network bandwidth and cloud providers charge extra for data transfers across geographical regions. In this work, we explore when and how it makes sense to leverage GPUs across zones and regions for distributed ML training. We analyze the throughput and cost impact of cross-region training based on the computation and communication patterns of different model parallelism strategies, develop a profile-based analytical model for estimating training throughput and cost, and provide guidelines for allocating geo-distributed resources efficiently. We find that although ML training throughput and cost with pure data parallelism degrades significantly when nodes span geographic regions, cross-region training with pipeline parallelism is practical.
引用
收藏
页码:107 / 116
页数:10
相关论文
共 50 条
  • [21] A cross-region analysis of commercial food waste recycling behaviour
    Mak, Tiffany M. W.
    Yu, Iris K. M.
    Xiong, Xinni
    Zaman, Nastaein Q.
    Yaacof, Nurashikin
    Hsu, Shu-Chien
    Poon, Chi Sun
    Tsang, Daniel C. W.
    CHEMOSPHERE, 2021, 274
  • [22] PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code
    Porpodas, Vasileios
    Ratnalikar, Pushkar
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2019, 2021, 11998 : 15 - 31
  • [23] A cross-region analysis of commercial food waste recycling behaviour
    Mak, Tiffany M.W.
    Yu, Iris K.M.
    Xiong, Xinni
    Zaman, Nastaein Q.
    Yaacof, Nurashikin
    Hsu, Shu-Chien
    Poon, Chi Sun
    Tsang, Daniel C.W.
    Chemosphere, 2021, 274
  • [24] GlobalFlow: A Cross-Region Orchestration Service for Serverless Computing Services
    Zheng, Ge
    Peng, Yang
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 508 - 510
  • [25] Environmental performance measurement with technology heterogeneity: Cross-region evidence
    Liu, D. Y.
    Chiu, C. R.
    Liou, J. L.
    ENERGY SOURCES PART B-ECONOMICS PLANNING AND POLICY, 2017, 12 (03) : 199 - 206
  • [26] A cross-region analysis of the output elasticity of transport investment in China
    Shi, Wenming
    Bang, Hee-Seok
    Li, Kevin X.
    MARITIME POLICY & MANAGEMENT, 2016, 43 (02) : 222 - 241
  • [27] Industry policy, cross-region investment, and enterprise investment efficiency
    Dai, Yixin
    Hou, Jiani
    Li, Xing
    RESEARCH IN INTERNATIONAL BUSINESS AND FINANCE, 2021, 56
  • [28] THE SOURCES OF TAIWAN'S REGIONAL UNEMPLOYMENT: A CROSS-REGION PANEL ANALYSIS
    Chuang, Yih-chyi
    Lai, Wei-wen
    HITOTSUBASHI JOURNAL OF ECONOMICS, 2008, 49 (02) : 47 - 65
  • [29] Carbon Reduction Impact Simulation of Cross-region Power Transmission in China
    Xing, Lu
    Wen, Quan
    Shan, Baoguo
    2013 IEEE PES ASIA-PACIFIC POWER AND ENERGY ENGINEERING CONFERENCE (APPEEC), 2013,
  • [30] Resource exploitation and cross-region growth trajectories: Nonparametric estimates for Chile
    Mainardi, Stefano
    JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2007, 85 (01) : 27 - 43