A Case for Adaptive Resource Management in Alibaba Datacenter Using Neural Networks

被引:0
|
作者
Sa Wang
Yan-Hai Zhu
Shan-Pei Chen
Tian-Ze Wu
Wen-Jie Li
Xu-Sheng Zhan
Hai-Yang Ding
Wei-Song Shi
Yun-Gang Bao
机构
[1] Chinese Academy of Sciences,State Key Laboratory of Computer Architecture, Institute of Computing Technology
[2] University of Chinese Academy of Sciences,Department of Computer Science
[3] Peng Cheng Laboratory,undefined
[4] Alibaba Inc.,undefined
[5] Wayne State University,undefined
关键词
resource management; neural network; resource efficiency; tail latency;
D O I
暂无
中图分类号
学科分类号
摘要
Both resource efficiency and application QoS have been big concerns of datacenter operators for a long time, but remain to be irreconcilable. High resource utilization increases the risk of resource contention between co-located workload, which makes latency-critical (LC) applications suffer unpredictable, and even unacceptable performance. Plenty of prior work devotes the effort on exploiting effective mechanisms to protect the QoS of LC applications while improving resource efficiency. In this paper, we propose MAGI, a resource management runtime that leverages neural networks to monitor and further pinpoint the root cause of performance interference, and adjusts resource shares of corresponding applications to ensure the QoS of LC applications. MAGI is a practice in Alibaba datacenter to provide on-demand resource adjustment for applications using neural networks. The experimental results show that MAGI could reduce up to 87.3% performance degradation of LC application when co-located with other antagonist applications.
引用
收藏
页码:209 / 220
页数:11
相关论文
共 50 条
  • [31] Adaptive radio resource management in F/TDMA cellular networks using smart antennas
    Hartmann, C
    Eberspächer, J
    EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 2001, 12 (05): : 439 - 452
  • [32] Deep Neural Network for Resource Management in NOMA Networks
    Yang, Ning
    Zhang, Haijun
    Long, Keping
    Hsieh, Hung-Yun
    Liu, Jiangchuan
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (01) : 876 - 886
  • [33] Adaptive QoS Resource Management by Using Hierarchical Distributed Classification for Future Generation Networks
    Fong, Simon
    RECENT TRENDS IN WIRELESS AND MOBILE NETWORKS, 2011, 162 : 266 - 278
  • [34] Congestion-aware adaptive forwarding in datacenter networks
    Zhang, Jiao
    Ren, Fengyuan
    Huang, Tao
    Tang, Li
    Liu, Yunjie
    COMPUTER COMMUNICATIONS, 2015, 62 : 34 - 46
  • [35] Green spine switch management for datacenter networks
    Li, Xiaolin
    Lung, Chung-Horng
    Majumdar, Shikharesh
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2016, 5
  • [36] Construction resource leveling using neural networks
    Savin, D
    Alkass, S
    Fazio, P
    CANADIAN JOURNAL OF CIVIL ENGINEERING, 1996, 23 (04) : 917 - 925
  • [37] Neural network-assisted decision-making for adaptive routing strategy in optical datacenter networks
    Hong, Yuanyuan
    Hong, Xuezhi
    Chen, Jiajia
    OPTICAL SWITCHING AND NETWORKING, 2022, 45
  • [38] Dynamic Topology Management in Optical Datacenter Networks
    Zhao, Yangming
    Wang, Sheng
    Luo, Shouxi
    Yu, Hongfang
    Xu, Shizhong
    Zhang, Xiaoning
    2014 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2014), 2014, : 2246 - 2251
  • [39] Green spine switch management for datacenter networks
    Xiaolin Li
    Chung-Horng Lung
    Shikharesh Majumdar
    Journal of Cloud Computing, 5
  • [40] Resources Management and Performance Analysis in Datacenter Networks
    Alshahrani, Reem
    Peyravi, Hassan
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 1517 - 1522