Comprehensive techniques for multi-tenant deep learning framework on a Hadoop YARN cluster

被引:0
|
作者
Heo, Seoungbeom [1 ]
Kang, Dae-Cheol [1 ]
Jang, Hyeounji [1 ]
Lee, Hyeock-Jin [1 ]
Cho, Minkyoung [1 ]
Kim, Jik-Soo [1 ]
机构
[1] Myongji Univ, Dept Comp Engn, Yongin, South Korea
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2023年 / 26卷 / 05期
基金
新加坡国家研究基金会;
关键词
Hadoop; YARN; Deep Learning; Lustre;
D O I
10.1007/s10586-022-03799-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have designed and implemented a new data processing framework called "MeLoN" (Multi-tenant dEep Learning framework On yarN) which aims to effectively support distributed deep learning applications that can show another type of data-intensive workloads in the YARN-based Hadoop ecosystem. MeLoN is developed as one of Hadoop YARN applications so that it can transparently co-host existing deep learning applications with other data processing workflows. In this paper, we present comprehensive techniques that can effectively support multiple deep learning applications in a Hadoop YARN cluster by leveraging fine-grained GPU over-provisioning policy and a high-performance parallel file system for data staging which can improve the overall system throughput. Through our extensive experiments based on the representative deep learning workloads, we demonstrate that MeLoN can make an effective convergence of deep learning and the big data platform Hadoop by employing YARN-based resource allocation and execution mechanisms for running distributed deep learning applications. We believe that MeLoN can bring many additional interesting research issues including profiling of expected GPU memory usages of deep learning applications, supporting more complicated deep learning related jobs based on queuing systems which can ultimately contribute to a new data processing framework in the YARN-based Hadoop ecosystem.
引用
收藏
页码:2851 / 2864
页数:14
相关论文
共 50 条
  • [21] COS: Cross-Processor Operator Scheduling for Multi-Tenant Deep Learning Inference
    Lin, Changyao
    Liu, Jie
    2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
  • [22] AROMA: Evaluating Deep Learning Systems for Stealthy Integrity Attacks on Multi-tenant Accelerators
    Chen, Xiangru
    Merugu, Maneesh
    Zhang, Jiaqi
    Ray, Sandip
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2023, 19 (02)
  • [23] Multi-Tenant Cross-Slice Resource Orchestration: A Deep Reinforcement Learning Approach
    Chen, Xianfu
    Zhao, Zhifeng
    Wu, Celimuge
    Bennis, Mehdi
    Liu, Hang
    Ji, Yusheng
    Zhang, Honggang
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2019, 37 (10) : 2377 - 2392
  • [24] DProbe: Profiling and Predicting Multi-tenant Deep Learning Workloads for GPU Resource Scaling
    Zhou, Zechun
    Sun, Jingwei
    Mei, Hengquan
    Sun, Peng
    Sun, Guangzhong
    EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024, 2024, 14801 : 239 - 253
  • [25] Machine Learning Aided Orchestration in Multi-tenant Networks
    Natalino, Carlos
    Raza, Muhammad Rehan
    Rostami, Ahmad
    Ohlen, Peter
    Wosinska, Lena
    Monti, Paolo
    2018 IEEE PHOTONICS SOCIETY SUMMER TOPICAL MEETING SERIES (SUM), 2018, : 125 - 126
  • [26] DeepPlace: Learning to Place Applications in Multi-Tenant Clusters
    Mitra, Subrata
    Mondal, Shanka Subhra
    Sheoran, Nikhil
    Dhake, Neeraj
    Nehra, Ravinder
    Simha, Ramanuja
    APSYS'19: PROCEEDINGS OF THE 10TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS, 2019, : 61 - 68
  • [27] ITADP: An inter-tenant attack detection and prevention framework for multi-tenant SaaS
    Yassin, Mohamed
    Talhi, Chamseddine
    Boucheneb, Hanifa
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2019, 49
  • [28] An Enhancement Framework for RDMA Congestion Control in Multi-tenant Datacenters
    Wang, Tianshi
    Zhang, Yiran
    Zhou, Ao
    Wang, Shangguang
    PROCEEDINGS OF 2024 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, NOMS 2024, 2024,
  • [29] Security Assessment Framework for Multi-tenant Cloud with Nested Virtualization
    Mjihil, Oussama
    Kim, Dong Seong
    Haqiq, Abdelkrim
    JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2016, 11 (02): : 87 - 96
  • [30] Daphne: A Flexible and Hybrid Scheduling Framework in Multi-Tenant Clusters
    Xia, Yiqian
    Ren, Rui
    Cai, Hongming
    Vasilakos, Athanasios V.
    Lv, Zheng
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2018, 15 (01): : 330 - 343