ML Training with Cloud GPU Shortages: Is Cross-Region the Answer?

被引:2
|
作者
Strati, Foteini [1 ]
Elvinger, Paul [1 ]
Kerimoglu, Tolga [1 ]
Klimovic, Ana [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
Machine Learning; Cloud computing;
D O I
10.1145/3642970.3655843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The widespread adoption of ML has led to a high demand for GPU hardware and consequently, severe shortages of GPUs in the public cloud. Allocating a sufficient number of GPUs to train or fine-tune today's large ML models in a single cloud region is often difficult. Users can get access to more GPUs if they are willing to run a ML training job using devices across different geographical regions. However, GPU nodes are connected with lower network bandwidth and cloud providers charge extra for data transfers across geographical regions. In this work, we explore when and how it makes sense to leverage GPUs across zones and regions for distributed ML training. We analyze the throughput and cost impact of cross-region training based on the computation and communication patterns of different model parallelism strategies, develop a profile-based analytical model for estimating training throughput and cost, and provide guidelines for allocating geo-distributed resources efficiently. We find that although ML training throughput and cost with pure data parallelism degrades significantly when nodes span geographic regions, cross-region training with pipeline parallelism is practical.
引用
收藏
页码:107 / 116
页数:10
相关论文
共 50 条
  • [1] Load Balancing Framework for Cross-Region Tasks in Cloud Computing
    Nazir, Jaleel
    Iqbal, Muhammad Waseem
    Alyas, Tahir
    Hamid, Muhammad
    Saleem, Muhammad
    Malik, Saadia
    Tabassum, Nadia
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 1479 - 1490
  • [2] DeepSpotCloud: Leveraging Cross-Region GPU Spot Instances for Deep Learning
    Lee, Kyungyong
    Son, Myungjun
    2017 IEEE 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2017, : 98 - 105
  • [3] Caste and Cross-region Marriages in Haryana, India: Experience of Dalit cross-region brides in Jat households
    Kukreja, Reena
    MODERN ASIAN STUDIES, 2018, 52 (02) : 492 - 531
  • [4] Measurement and Observation of Cross-Provider Cross-Region Latency for Cloud-based IoT Systems
    Thy Vu
    Mediran, Chayanne Jaye
    Peng, Yang
    2019 IEEE WORLD CONGRESS ON SERVICES (IEEE SERVICES 2019), 2019, : 364 - 365
  • [5] Economic geography and cross-region fertility revisited
    Bu, Nanyang
    Wang, Jian
    APPLIED ECONOMICS LETTERS, 2023, 30 (18) : 2637 - 2640
  • [6] A Cross-Region Panel Analysis of the Migration in Romania
    Condratov, Iulian-Alexandru
    14TH ECONOMIC INTERNATIONAL CONFERENCE: STRATEGIES AND DEVELOPMENT POLICIES OF TERRITORIES: INTERNATIONAL, COUNTRY, REGION, CITY, LOCATION CHALLENGES, 2018, : 7 - 21
  • [7] Cross-region Traffic Prediction for China on OpenStreetMap
    Xu, Frank F.
    Lin, Bill Y.
    Lu, Qi
    Huang, Yifei
    Zhu, Kenny Q.
    PROCEEDINGS OF THE 9TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON COMPUTATIONAL TRANSPORTATION SCIENCE (IWCTS 2016), 2016, : 37 - 42
  • [8] Cross-region and cross-sector asset allocation with regimes
    Dou, Paul Y.
    Gallagher, David R.
    Schneider, David
    Walter, Terry S.
    ACCOUNTING AND FINANCE, 2014, 54 (03): : 809 - 846
  • [9] CRFormer: A cross-region transformer for shadow removal
    Wan, Jin
    Yin, Hui
    Wu, Zhenyao
    Wu, Xinyi
    Liu, Zhihao
    Wang, Song
    IMAGE AND VISION COMPUTING, 2024, 151
  • [10] Environmental Trust: A Cross-Region and Cross-Country Study
    Marquart-Pyatt, Sandra T.
    SOCIETY & NATURAL RESOURCES, 2016, 29 (09) : 1032 - 1048