A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks

被引:0
|
作者
Sa Wang
Yan-Hai Zhu
Shan-Pei Chen
Tian-Ze Wu
Wen-Jie Li
Xu-Sheng Zhan
Hai-Yang Ding
Wei-Song Shi
Yun-Gang Bao
机构
[1] Chinese Academy of Sciences,State Key Laboratory of Computer Architecture, Institute of Computing Technology
[2] University of Chinese Academy of Sciences,Department of Computer Science
[3] Peng Cheng Laboratory,undefined
[4] Alibaba Inc.,undefined
[5] Wayne State University,undefined
关键词
resource management; neural network; resource efficiency; tail latency;
D O I
暂无
中图分类号
学科分类号
摘要
Both resource efficiency and application QoS have been big concerns of datacenter operators for a long time, but remain to be irreconcilable. High resource utilization increases the risk of resource contention between co-located workload, which makes latency-critical (LC) applications suffer unpredictable, and even unacceptable performance. Plenty of prior work devotes the effort on exploiting effective mechanisms to protect the QoS of LC applications while improving resource efficiency. In this paper, we propose MAGI, a resource management runtime that leverages neural networks to monitor and further pinpoint the root cause of performance interference, and adjusts resource shares of corresponding applications to ensure the QoS of LC applications. MAGI is a practice in Alibaba datacenter to provide on-demand resource adjustment for applications using neural networks. The experimental results show that MAGI could reduce up to 87.3% performance degradation of LC application when co-located with other antagonist applications.
引用
收藏
页码:209 / 220
页数:11
相关论文
共 50 条
  • [1] A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks
    Wang, Sa
    Zhu, Yan-Hai
    Chen, Shan-Pei
    Wu, Tian-Ze
    Li, Wen-Jie
    Zhan, Xu-Sheng
    Ding, Hai-Yang
    Shi, Wei-Song
    Bao, Yun-Gang
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (01) : 209 - 220
  • [2] Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces
    Guo, Jing
    Chang, Zihao
    Wang, Sa
    Ding, Haiyang
    Feng, Yihui
    Mao, Liang
    Bao, Yungang
    PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS 2019), 2019,
  • [3] Characterizing Co-located Datacenter Workloads: An Alibaba Case Study
    Cheng, Yue
    Chai, Zheng
    Anwar, Ali
    9TH ASIA-PACIFIC SYSTEMS WORKSHOP 2018 (APSYS'18), 2018,
  • [4] Adaptive Routing for Datacenter Networks Using Ant Colony Optimization
    Hu, Jinbin
    He, Man
    Rao, Shuying
    Wang, Yue
    Wang, Jing
    He, Shiming
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT III, 2024, 14489 : 290 - 309
  • [5] Optimizing the Resource Utilization of Datacenter Networks with OpenFlow
    Liu Bo
    Chen Ming
    Hu Chao
    Hu Hui
    Xu Bo
    CHINA COMMUNICATIONS, 2016, 13 (03) : 1 - 11
  • [6] Optimizing the Resource Utilization of Datacenter Networks with OpenFlow
    LIU Bo
    CHEN Ming
    HU Chao
    HU Hui
    XU Bo
    中国通信, 2016, 13 (03) : 1 - 11
  • [7] ABR traffic management using minimal resource allocation (neural) networks
    Soon, NH
    Sundararajan, N
    Saratchandran, P
    COMPUTER COMMUNICATIONS, 2002, 25 (01) : 9 - 20
  • [8] Improving Datacenter Operations Management using Wireless Sensor Networks
    Garefalakis, Panagiotis
    Magoutis, Kostas
    2012 IEEE INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND COMMUNICATIONS, CONFERENCE ON INTERNET OF THINGS, AND CONFERENCE ON CYBER, PHYSICAL AND SOCIAL COMPUTING (GREENCOM 2012), 2012, : 195 - 202
  • [9] Adaptive Resource Management Platform for Reconfigurable Networks
    George Dimitrakopoulos
    Klaus Moessner
    Clemens Kloeck
    David Grandblaise
    Sophie Gault
    Oriol Sallent
    Kostas Tsagkaris
    Panagiotis Demestichas
    Mobile Networks and Applications, 2006, 11 (6) : 799 - 811
  • [10] Adaptive resource management platform for reconfigurable networks
    Dimitrakopoulos, George
    Moessner, Klaus
    Kloeck, Clemens
    Grandblaise, David
    Gault, Sophie
    Sallent, Oriol
    Tsagkaris, Kostas
    Demestichas, Panagiotis
    MOBILE NETWORKS & APPLICATIONS, 2006, 11 (06): : 799 - 811