Comprehensive techniques for multi-tenant deep learning framework on a Hadoop YARN cluster

被引:0
|
作者
Heo, Seoungbeom [1 ]
Kang, Dae-Cheol [1 ]
Jang, Hyeounji [1 ]
Lee, Hyeock-Jin [1 ]
Cho, Minkyoung [1 ]
Kim, Jik-Soo [1 ]
机构
[1] Myongji Univ, Dept Comp Engn, Yongin, South Korea
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2023年 / 26卷 / 05期
基金
新加坡国家研究基金会;
关键词
Hadoop; YARN; Deep Learning; Lustre;
D O I
10.1007/s10586-022-03799-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have designed and implemented a new data processing framework called "MeLoN" (Multi-tenant dEep Learning framework On yarN) which aims to effectively support distributed deep learning applications that can show another type of data-intensive workloads in the YARN-based Hadoop ecosystem. MeLoN is developed as one of Hadoop YARN applications so that it can transparently co-host existing deep learning applications with other data processing workflows. In this paper, we present comprehensive techniques that can effectively support multiple deep learning applications in a Hadoop YARN cluster by leveraging fine-grained GPU over-provisioning policy and a high-performance parallel file system for data staging which can improve the overall system throughput. Through our extensive experiments based on the representative deep learning workloads, we demonstrate that MeLoN can make an effective convergence of deep learning and the big data platform Hadoop by employing YARN-based resource allocation and execution mechanisms for running distributed deep learning applications. We believe that MeLoN can bring many additional interesting research issues including profiling of expected GPU memory usages of deep learning applications, supporting more complicated deep learning related jobs based on queuing systems which can ultimately contribute to a new data processing framework in the YARN-based Hadoop ecosystem.
引用
收藏
页码:2851 / 2864
页数:14
相关论文
共 50 条
  • [41] Reinforcement Learning for Resource Management in Multi-tenant Serverless Platforms
    Qiu, Haoran
    Mao, Weichao
    Patke, Archit
    Wang, Chen
    Franke, Hubertus
    Kalbarczyk, Zbigniew T.
    Basar, Tamer
    Iyer, Ravishankar K.
    PROCEEDINGS OF THE 2022 2ND EUROPEAN WORKSHOP ON MACHINE LEARNING AND SYSTEMS (EUROMLSYS '22), 2022, : 20 - 28
  • [42] BTQoS: A Tenant Relationship-Aware QoS Framework for Multi-tenant Distributed Storage System
    Lu, Yixuan
    Qu, Yun
    Zhu, Dongjie
    Zhang, Rui
    Du, Haiwen
    WEB AND BIG DATA, APWEB-WAIM 2024, PT IV, 2024, 14964 : 245 - 260
  • [43] A Survey on Power Management Techniques for Oversubscription of Multi-Tenant Data Centers
    Malla, Sulav
    Christensen, Ken
    ACM COMPUTING SURVEYS, 2019, 52 (01)
  • [44] Framework for Analysing a Policy-driven Multi-Tenant Kubernetes Environment
    Beltre, Angel
    Saha, Pankaj
    Govindaraju, Madhusudhan
    2021 IEEE CLOUD SUMMIT (CLOUD SUMMIT 2021), 2021, : 49 - 56
  • [45] Virtualization-based techniques for enabling multi-tenant management tools
    Tsai, Chang-Hao
    Ruan, Yaoping
    Sahu, Sambit
    Shaikh, Anees
    Shin, Kang G.
    MANAGING VIRTUALIZATION OF NETWORKS AND SERVICES, PROCEEDINGS, 2007, 4785 : 171 - +
  • [46] Using Intrusive Microservices to Enable Deep Customization of Multi-Tenant SaaS
    Chauvel, Franck
    Solberg, Arnor
    2018 11TH INTERNATIONAL CONFERENCE ON THE QUALITY OF INFORMATION AND COMMUNICATIONS TECHNOLOGY (QUATIC), 2018, : 30 - 37
  • [47] A Quantitative Defense Framework against Power Attacks on Multi-tenant FPGA
    Luo, Yukui
    Xu, Xiaolin
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [48] Applying Stochastic Metaheuristics to the Problem of Data Management in a Multi-Tenant Database Cluster
    Boytsov, E. A.
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2014, 48 (07) : 594 - 601
  • [49] NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision
    Fang, Biyi
    Zeng, Xiao
    Zhang, Mi
    MOBICOM'18: PROCEEDINGS OF THE 24TH ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING, 2018, : 115 - 127
  • [50] MicroEdge: A Multi-Tenant Edge Cluster System Architecture for Scalable Camera Processing
    Cao, Difei
    Yoo, Jinsun
    Xu, Zhuangdi
    Saurez, Enrique
    Gupta, Harshit
    Krishna, Tushar
    Ramachandran, Umakishore
    PROCEEDINGS OF THE TWENTY-THIRD ACM/IFIP INTERNATIONAL MIDDLEWARE CONFERENCE, MIDDLEWARE 2022, 2022, : 322 - 334