A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks

被引:0
|
作者
Sa Wang
Yan-Hai Zhu
Shan-Pei Chen
Tian-Ze Wu
Wen-Jie Li
Xu-Sheng Zhan
Hai-Yang Ding
Wei-Song Shi
Yun-Gang Bao
机构
[1] Chinese Academy of Sciences,State Key Laboratory of Computer Architecture, Institute of Computing Technology
[2] University of Chinese Academy of Sciences,Department of Computer Science
[3] Peng Cheng Laboratory,undefined
[4] Alibaba Inc.,undefined
[5] Wayne State University,undefined
关键词
resource management; neural network; resource efficiency; tail latency;
D O I
暂无
中图分类号
学科分类号
摘要
Both resource efficiency and application QoS have been big concerns of datacenter operators for a long time, but remain to be irreconcilable. High resource utilization increases the risk of resource contention between co-located workload, which makes latency-critical (LC) applications suffer unpredictable, and even unacceptable performance. Plenty of prior work devotes the effort on exploiting effective mechanisms to protect the QoS of LC applications while improving resource efficiency. In this paper, we propose MAGI, a resource management runtime that leverages neural networks to monitor and further pinpoint the root cause of performance interference, and adjusts resource shares of corresponding applications to ensure the QoS of LC applications. MAGI is a practice in Alibaba datacenter to provide on-demand resource adjustment for applications using neural networks. The experimental results show that MAGI could reduce up to 87.3% performance degradation of LC application when co-located with other antagonist applications.
引用
收藏
页码:209 / 220
页数:11
相关论文
共 50 条
  • [21] Adaptive flow scheduling for modular datacenter networks
    Zhang, Xingyan
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2017, 10 (05) : 1142 - 1151
  • [22] Missing Data Recovery in Large-scale, Sparse Datacenter Traces: An Alibaba Case Study
    Liang, Yi
    Bi, Linfeng
    Su, Xing
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 251 - 261
  • [23] INVENTORY MANAGEMENT USING ARTIFICIAL NEURAL NETWORKS IN A CONCRETE CASE
    Vochozka, Marek
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE INNOVATION MANAGEMENT, ENTREPRENEURSHIP AND SUSTAINABILITY (IMES 2017), 2017, : 1084 - 1094
  • [24] Adaptive neural queue management for TCP networks
    Cho, Hyun C.
    Fadali, Sami M.
    Lee, Hyunjeong
    COMPUTERS & ELECTRICAL ENGINEERING, 2008, 34 (06) : 447 - 469
  • [25] Adaptive resource management in mobile wireless cellular networks
    Hossain, M
    Hassan, M
    Sirisena, HR
    TELECOMMUNICATIONS AND NETWORKING - ICT 2004, 2004, 3124 : 394 - 399
  • [26] Adaptive Resource Management and Control in Software Defined Networks
    Tuncer, Daphne
    Charalambides, Marinos
    Clayman, Stuart
    Pavlou, George
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2015, 12 (01): : 18 - 33
  • [27] Towards Dynamic and Adaptive Resource Management for Emerging Networks
    Tuncer, Daphne
    Charalambides, Marinos
    Pavlou, George
    MECHANISMS FOR AUTONOMOUS MANAGEMENT OF NETWORKS AND SERVICES, 2010, 6155 : 93 - 97
  • [28] Cloud Datacenter Workload Prediction Using Complex-Valued Neural Networks
    Aizenberg, Igor
    Qazi, Kashifuddin
    2018 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2018, : 315 - 321
  • [29] An Adaptive Resource Management Scheme for Heterogeneous Wireless Networks
    Elahi, Mohammad Mamun
    Hossain, Md. Shamim
    Islam, Mohammad Mahfuzul
    2012 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2012, : 299 - 304
  • [30] Adaptive resource management for multimedia applications in wireless networks
    Banerjee, N
    Basu, K
    Das, SK
    SIXTH IEEE INTERNATIONAL SYMPOSIUM ON A WORLD OF WIRELESS MOBILE AND MULTIMEDIA NETWORKS, PROCEEDINGS, 2005, : 250 - 257