Comprehensive techniques for multi-tenant deep learning framework on a Hadoop YARN cluster

被引:0
|
作者
Heo, Seoungbeom [1 ]
Kang, Dae-Cheol [1 ]
Jang, Hyeounji [1 ]
Lee, Hyeock-Jin [1 ]
Cho, Minkyoung [1 ]
Kim, Jik-Soo [1 ]
机构
[1] Myongji Univ, Dept Comp Engn, Yongin, South Korea
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2023年 / 26卷 / 05期
基金
新加坡国家研究基金会;
关键词
Hadoop; YARN; Deep Learning; Lustre;
D O I
10.1007/s10586-022-03799-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have designed and implemented a new data processing framework called "MeLoN" (Multi-tenant dEep Learning framework On yarN) which aims to effectively support distributed deep learning applications that can show another type of data-intensive workloads in the YARN-based Hadoop ecosystem. MeLoN is developed as one of Hadoop YARN applications so that it can transparently co-host existing deep learning applications with other data processing workflows. In this paper, we present comprehensive techniques that can effectively support multiple deep learning applications in a Hadoop YARN cluster by leveraging fine-grained GPU over-provisioning policy and a high-performance parallel file system for data staging which can improve the overall system throughput. Through our extensive experiments based on the representative deep learning workloads, we demonstrate that MeLoN can make an effective convergence of deep learning and the big data platform Hadoop by employing YARN-based resource allocation and execution mechanisms for running distributed deep learning applications. We believe that MeLoN can bring many additional interesting research issues including profiling of expected GPU memory usages of deep learning applications, supporting more complicated deep learning related jobs based on queuing systems which can ultimately contribute to a new data processing framework in the YARN-based Hadoop ecosystem.
引用
收藏
页码:2851 / 2864
页数:14
相关论文
共 50 条
  • [1] Comprehensive techniques for multi-tenant deep learning framework on a Hadoop YARN cluster
    Seoungbeom Heo
    Dae-Cheol Kang
    Hyeounji Jang
    Hyeock-Jin Lee
    Minkyoung Cho
    Jik-Soo Kim
    Cluster Computing, 2023, 26 : 2851 - 2864
  • [2] Secure and Multi-tenant Hadoop Cluster - An Experience
    Wankhede, Paresh
    Paul, Nayanjyoti
    2016 2ND INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2016,
  • [3] Dependability in a Multi-tenant Multi-framework Deep Learning as-a-Service Platform
    Boag, Scott
    Dube, Parijat
    El Maghraoui, Kaoutar
    Herta, Benjamin
    Hummer, Waldemar
    Jayaram, K. R.
    Khalaf, Rania
    Muthusamy, Vinod
    Kalantar, Michael
    Verma, Archit
    2018 48TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W), 2018, : 43 - 46
  • [4] Performance Analysis of Hadoop YARN Job Schedulers in a Multi-Tenant Environment on HiBench Benchmark Suite
    Bawankule, Kamalakant Laxman
    Dewang, Rupesh Kumar
    Singh, Anil Kumar
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2021, 12 (03) : 64 - 82
  • [5] FfDL: A Flexible Multi-tenant Deep Learning Platform
    Jayaram, K. R.
    Muthusamy, Vinod
    Dube, Parijat
    Ishakian, Vatche
    Wang, Chen
    Herta, Benjamin
    Boag, Scott
    Arroyo, Diana
    Tantawi, Asser
    Verma, Archit
    Pollok, Falk
    Khalaf, Rania
    MIDDLEWARE'19: PROCEEDINGS OF THE 2019 MIDDLEWARE'19: 20TH INTERNATIONAL MIDDLEWARE CONFERENCE, 2019, : 82 - 95
  • [6] Elastic Deep Learning in Multi-Tenant GPU Clusters
    Wu, Yidi
    Ma, Kaihao
    Yan, Xiao
    Liu, Zhi
    Cai, Zhenkun
    Huang, Yuzhen
    Cheng, James
    Yuan, Han
    Yu, Fan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (01) : 144 - 158
  • [7] A predictive replication for multi-tenant databases using deep learning
    Abdel Raouf, Ahmed E.
    Abo-alian, Alshaimaa
    Badr, Nagwa L.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13):
  • [8] A Secure Multi-Tenant Framework for SDN
    Jiang, Hao
    Bouabdallah, Ahmed
    Aflatoonian, Amin
    Bonnin, Jean-Marie
    Guillouard, Karine
    SECURITY OF INFORMATION AND NETWORKS (SIN'16), 2016, : 40 - 44
  • [9] Multi-Tenant Deep Learning Acceleration with Competitive GPU Resource Sharing
    Yu, Yongbo
    Chen, Xiang
    2023 IEEE CLOUD SUMMIT, 2023, : 49 - 51
  • [10] Astraea: A Fair Deep Learning Scheduler for Multi-Tenant GPU Clusters
    Ye, Zhisheng
    Sun, Peng
    Gao, Wei
    Zhang, Tianwei
    Wang, Xiaolin
    Yan, Shengen
    Luo, Yingwei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2781 - 2793