Optimizing I/O Performance Through Effective vCPU Scheduling Interference Management

被引:0
|
作者
Wang, Liang [1 ]
Yang, Jinzhe [2 ]
Zhai, Jidong [1 ]
Yang, Guangwen [1 ,3 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Imperial Coll London, TC Technol, London SW7 2BX, England
[3] Zhejiang Lab, Hangzhou 311121, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Interference; Cloud computing; Dynamic scheduling; Production; Task analysis; Processor scheduling; Performance evaluation; Virtualization; cloud computing; vCPU scheduling; I/O performance; interference diagnosis;
D O I
10.1109/TPDS.2023.3329298
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Virtual machines (VMs) heavily rely on virtual CPUs (vCPUs) scheduling to achieve efficient I/O performance. The vCPU scheduling interference can cause inconsistent scheduling latency and degraded I/O performance, potentially compromising the services provided by affected VMs. Existing solutions have limitations, such as inefficiency in diagnosing interference issues or imposing undesired side effects on cloud systems. To address these challenges, we present Otter, a holistic technique for optimizing I/O performance in the presence of vCPU scheduling interference. Otter employs innovative methods to enhance interference diagnosis efficiency. First, we propose lightweight methods to measure the dynamic changes in scheduling latencies for co-running vCPUs, ensuring both flexibility and accuracy. Second, we propose fine-grained quantification methods to timely determine the interference, with low false positive and false negative rates. Third, we identify interference patterns that aid in analyzing the root causes of interference and preventing similar issues from recurring. Otter has been operational for one year in the production cloud at the National Supercomputing Center (Wuxi). It diagnoses and helps fix more than 470 vCPU scheduling interference-related issues, resulting in a 19.6% improvement in cloud service I/O performance with negligible overhead in production.
引用
收藏
页码:2315 / 2330
页数:16
相关论文
共 50 条
  • [41] Extending I/O through High Performance Data Services
    Abbasi, Hasan
    Lofstead, Jay
    Fang Zheng
    Schwan, Karsten
    Wolf, Matthew
    Klasky, Scott
    2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 253 - +
  • [42] PHDFS: Optimizing I/O performance of HDFS in deep learning cloud computing platform
    Zhu, Zongwei
    Tan, Luchao
    Li, Yinzhen
    Ji, Cheng
    JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 109
  • [43] Exploring I/O Management Performance in ZNS with ConfZNS plus
    Doekemeijer, Krijn
    Maisenbacher, Dennis
    Ren, Zebin
    Tehrany, Nick
    Bjorling, Matias
    Trivedi, Animesh
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, SYSTOR 2024, 2024, : 162 - 177
  • [44] LAWC: Optimizing Write Cache Using Layout-Aware I/O Scheduling for All Flash Storage
    Ganesh, Kalidas
    Kim, Youngjae
    Debnath, Monobrata
    Park, Sungyong
    Lee, Junghee
    IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (11) : 1890 - 1902
  • [45] Optimizing Performance Outcomes for Emergency Management Personnel Through Simulation Based Training Applications
    Tarr, Ronald W.
    VIRTUAL, AUGMENTED AND MIXED REALITY, 2017, 10280 : 302 - 311
  • [46] Optimizing Risk Management for the Sustainable Performance of the Regional Innovation System in Korea through Metamediation
    Choi, Yongrok
    Lee, Eui Young
    HUMAN AND ECOLOGICAL RISK ASSESSMENT, 2009, 15 (02): : 270 - 280
  • [47] iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/O
    Zhang, Xuechen
    Davis, Kei
    Jiang, Song
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 715 - 726
  • [48] TECHNIQUES FOR SCHEDULING I/O IN A HIGH-PERFORMANCE MULTIMEDIA-ON-DEMAND SERVER
    JADAV, D
    SRINILTA, C
    CHOUDHARY, A
    BERRA, PB
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 30 (02) : 190 - 203
  • [49] Improving File Tree Traversal Performance by Scheduling I/O Operations in User space
    Lunde, Carl Henrik
    Espeland, Havard
    Stensland, Hakon Kvale
    Halvorsen, Pal
    2009 IEEE 28TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCC 2009), 2009, : 145 - +
  • [50] AdapCK: Optimizing I/O for Checkpointing on Large-Scale High Performance Computing Systems
    Jia, Jie
    Liu, Yi
    Liu, Yanke
    Chen, Yifan
    Lin, Fang
    EURO-PAR 2024: PARALLEL PROCESSING, PT III, EURO-PAR 2024, 2024, 14803 : 342 - 355