Optimizing I/O Performance Through Effective vCPU Scheduling Interference Management

被引:0
|
作者
Wang, Liang [1 ]
Yang, Jinzhe [2 ]
Zhai, Jidong [1 ]
Yang, Guangwen [1 ,3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Imperial Coll London, TC Technol, London SW7 2BX, England
[3] Zhejiang Lab, Hangzhou 311121, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Interference; Cloud computing; Dynamic scheduling; Production; Task analysis; Processor scheduling; Performance evaluation; Virtualization; cloud computing; vCPU scheduling; I/O performance; interference diagnosis;
D O I
10.1109/TPDS.2023.3329298
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Virtual machines (VMs) heavily rely on virtual CPUs (vCPUs) scheduling to achieve efficient I/O performance. The vCPU scheduling interference can cause inconsistent scheduling latency and degraded I/O performance, potentially compromising the services provided by affected VMs. Existing solutions have limitations, such as inefficiency in diagnosing interference issues or imposing undesired side effects on cloud systems. To address these challenges, we present Otter, a holistic technique for optimizing I/O performance in the presence of vCPU scheduling interference. Otter employs innovative methods to enhance interference diagnosis efficiency. First, we propose lightweight methods to measure the dynamic changes in scheduling latencies for co-running vCPUs, ensuring both flexibility and accuracy. Second, we propose fine-grained quantification methods to timely determine the interference, with low false positive and false negative rates. Third, we identify interference patterns that aid in analyzing the root causes of interference and preventing similar issues from recurring. Otter has been operational for one year in the production cloud at the National Supercomputing Center (Wuxi). It diagnoses and helps fix more than 470 vCPU scheduling interference-related issues, resulting in a 19.6% improvement in cloud service I/O performance with negligible overhead in production.
引用
收藏
页码:2315 / 2330
页数:16
相关论文
共 50 条
  • [1] Effectively Mitigating I/O Inactivity in vCPU Scheduling
    Jia, Weiwei
    Wang, Cheng
    Chen, Xusheng
    Shan, Jianchen
    Shang, Xiaowei
    Cui, Herning
    Ding, Xiaoning
    Cheng, Luwei
    Lau, Francis C. M.
    Wang, Yuexuan
    Wang, Yuangang
    PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, 2018, : 267 - 279
  • [2] Optimizing Performance of Cloud Infrastructure Through Effective Resource Scheduling
    Abdullah, M.
    Surputheen, M. Mohamed
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (01): : 1 - 14
  • [3] Understanding the Effects of Hypervisor I/O Scheduling for Virtual Machine Performance Interference
    Yang, Ziye
    Fang, Haifeng
    Wu, Yingjun
    Li, Chunqi
    Zhao, Bin
    Huang, H. Howie
    2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2012,
  • [4] Measuring the Characteristics of Hypervisor I/O Scheduling in the Cloud for Virtual Machine Performance Interference
    Yang, Ziye
    Fang, Haifeng
    Wu, Yingjun
    Li, Chunqi
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2013, 5 (04) : 5 - 29
  • [5] Optimizing the Query Performance of Block Index Through Data Analysis and I/O Modeling
    Wu, Tzuhsien
    Chou, Jerry
    Hao, Shyng
    Dong, Bin
    Klasky, Scott
    Wu, Kesheng
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,
  • [6] OPTIMIZING SYSTEM PERFORMANCE THROUGH DYNAMIC LOAD SHED SCHEDULING
    FINLEY, LA
    STANDISH, TR
    PHILLIPS, RC
    IEEE TRANSACTIONS ON POWER APPARATUS AND SYSTEMS, 1985, 104 (06): : 1286 - 1289
  • [7] Optimizing I/O Performance of HPC Applications with Autotuning
    Behzad, Babak
    Byna, Surendra
    Prabhat
    Snir, Marc
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 5 (04)
  • [8] Optimizing Center Performance through Coordinated Data Staging, Scheduling and Recovery
    Zhang, Zhe
    Wang, Chao
    Vazhkudai, Sudharshan S.
    Ma, Xiaosong
    Pike, Gregory G.
    Cobb, John W.
    Mueller, Frank
    2007 ACM/IEEE SC07 CONFERENCE, 2010, : 217 - +
  • [9] Effective Maintenance Management Risk and Reliability Strategies for Optimizing Performance
    Raouf, Abdul
    JOURNAL OF QUALITY IN MAINTENANCE ENGINEERING, 2005, 11 (01) : 97 - 98
  • [10] Improving Dense Network Performance Through Centralized Scheduling and Interference Coordination
    Fernandez-Lopez, Victor
    Pedersen, Klaus I.
    Soret, Beatriz
    Steiner, Jens
    Mogensen, Preben
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2017, 66 (05) : 4371 - 4382