Facilitating the HPC Data Center Host efficiency through Big Data Analytics

被引:0
|
作者
Rager, Jack [1 ]
Liu, Fang Cherry [2 ]
机构
[1] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Partnership Adv Comp Environm PACE, Atlanta, GA 30332 USA
关键词
High Performance Computing; Host Analysis; Unsupervised Machine Learning; Data Center;
D O I
10.1109/BigData50022.2020.9378487
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quality of service is important feature for a High Performance Computing Center (HPC) center like Partnership for an Advanced Computing Environment (PACE) center in Georgia Institute of Technology (Georgia Tech). The user's job fails running on a HPC center may due to a spectral of reasons, one of major contributor is the hardware and network failure. Reducing the hardware failure rate can significantly increase a data center's quality of service as well as reducing the cost of human intervention. This is critical during PACE's transition to a fee-based service model in which uptime correlates directly with revenue. PACE has around 9 millions jobs each year with 12% of job failure rate. In order to extend service life of hardware and reduce the potential failure and data center's cost, we present a machine learning method to understand the center's host usage pattern. By clustering the hosts based on multiple features, we reshuffle the host list to avoid the hosts being overused over time. We build a test framework which runs the complex combination of experiments, and presents the ad hoc comparisons. We intend to make the machine learning method in a rack aware fashion, and show the meaningful result with rack information included.
引用
收藏
页码:3280 / 3287
页数:8
相关论文
共 50 条
  • [21] A survey into performance and energy efficiency in HPC, cloud and big data environments
    Inacio, Eduardo Camilo
    Dantas, Mario A.R.
    International Journal of Networking and Virtual Organisations, 2014, 14 (04) : 299 - 318
  • [22] Big data analytics and business analytics
    Duan, Lian
    Xiong, Ye
    JOURNAL OF MANAGEMENT ANALYTICS, 2015, 2 (01) : 1 - 21
  • [23] A Big Data Analytics Framework for HPC Log Data: Three Case Studies Using the Titan Supercomputer Log
    Park, Byung H.
    Hui, Yawei
    Boehm, Swen
    Ashraf, Rizwan A.
    Layton, Christopher
    Engelmann, Christian
    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 571 - 579
  • [24] Future Trend of Deep Learning Frameworks - From the perspective of Big Data analytics and HPC
    Araki, Takuya
    Nakamura, Yuichi
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 696 - 703
  • [25] Accelerating big data analytics on HPC clusters using two-level storage
    Xuan, Pengfei
    Ligon, Walter B.
    Srimani, Pradip K.
    Ge, Rong
    Luo, Feng
    PARALLEL COMPUTING, 2017, 61 : 18 - 34
  • [26] Big Data Analytics in Airlines: Efficiency Evaluation using DEA
    Rachman, Zudha Aulia
    Arviansysh
    2019 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2019, : 508 - 513
  • [27] Protagonist of Big Data and Predictive Analytics using data analytics
    Subbalakshmi, Sakineti
    Prabhu, C. S. R.
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNIQUES, ELECTRONICS AND MECHANICAL SYSTEMS (CTEMS), 2018, : 276 - 279
  • [28] Business Intelligence Through Big Data Analytics, Data Mining and Machine Learning
    Yafooz, Wael M. S.
    Abu Bakar, Zainab Binti
    Fahad, S. K. Ahammad
    Mithun, Ahamed. M.
    DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2019, VOL 2, 2020, 1016 : 217 - 230
  • [29] Introduction to big data and analytics: Pathways to maturity the original big data and analytics minitrack
    Kaisler, Stephen H.
    Armour, Frank J.
    Espinosa, J. Alberto
    Proceedings of the Annual Hawaii International Conference on System Sciences, 2020, 2020-January : 940 - 942
  • [30] Introduction to big data and analytics: Pathways to maturity the original big data and analytics minitrack
    Kaisler, Stephen H.
    Armour, Frank J.
    Espinosa, J. Alberto
    Proceedings of the Annual Hawaii International Conference on System Sciences, 2021, 2020-January : 936 - 939