Autonomic failure prediction based on manifold learning for large-scale distributed systems

被引:6
|
作者
Lu X. [1 ]
Wang H.-Q. [1 ]
Zhou R.-J. [1 ]
Ge B.-Y. [1 ]
机构
[1] College of Computer Science and Technology, Harbin Engineering University
基金
中国国家自然科学基金;
关键词
autonomic computing; failure prediction; locally linear embedding; manifold learning;
D O I
10.1016/S1005-8885(09)60497-0
中图分类号
学科分类号
摘要
This article investigates autonomic failure prediction in large-scale distributed systems with nonlinear dimensionality reduction to automatically extract failure features. Most existing methods for failure prediction focus on building prediction models or heuristic rules by discovering failure patterns, but the process of feature extraction before failure patterns recognition is rarely considered due to the increasing complexity of modern distributed systems. In this work, a novel performance-centric approach to automate failure prediction is proposed based on manifold learning (ML). In addition, the ML algorithm named supervised locally linear embedding (SLLE) is applied to achieve feature extraction. To generalize the dimensionality reduction mapping, the nonlinear mapping approximation and optimization solution is also proposed. In experimental work a file transfer test bed with fault injection is developed which can gather multilevel performance metrics transparently. Based on the runtime monitoring of these metrics, the SLLE method can automatically predict more than 50 of the central processing unit (CPU) and memory failures, and around 70 of the network failure. © 2010 The Journal of China Universities of Posts and Telecommunications.
引用
收藏
页码:116 / 124
页数:8
相关论文
共 50 条
  • [21] Energy efficiency in large-scale distributed systems
    Tuan Anh Trinh
    Hlavacs, Helmut
    Talia, Domenico
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, 2012, 28 (05): : 743 - 744
  • [22] Stability of large-scale distributed parameter systems
    Ladde, GS
    Li, TT
    DYNAMIC SYSTEMS AND APPLICATIONS, 2002, 11 (03): : 311 - 323
  • [23] Monitoring and control of large-scale distributed systems
    Legrand, C.
    GRID AND CLOUD COMPUTING: CONCEPTS AND PRACTICAL APPLICATIONS, 2016, 192 : 101 - 151
  • [24] Distributed Orchestration in Large-scale IoT Systems
    Yigitoglu, Emre
    Liu, Ling
    Looper, Margaret
    Pu, Calton
    2017 IEEE 2ND INTERNATIONAL CONGRESS ON INTERNET OF THINGS (IEEE ICIOT), 2017, : 58 - 65
  • [25] Large-scale network intrusion detection algorithm based on distributed learning
    College of Computer Science and Technology, Jilin University, Changchun 130012, China
    不详
    Ruan Jian Xue Bao/Journal of Software, 2008, 19 (04): : 993 - 1003
  • [26] Large-scale network intrusion detection based on distributed learning algorithm
    Tian, Daxin
    Liu, Yanheng
    Xiang, Yang
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2009, 8 (01) : 25 - 35
  • [27] Robust Scheduling for Large-Scale Distributed Systems
    Lee, Young Choon
    King, Jayden
    Kim, Young Ki
    Hong, Seok-Hee
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 38 - 45
  • [28] Distributed learning strategy based on chips for classification with large-scale dataset
    Yang, Bo
    Su, Xiaohong
    Wang, Yadong
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (05) : 899 - 920
  • [29] Robustness of large-scale distributed computer systems
    Khoroshevsky, VG
    EUROSIM '96 - HPCN CHALLENGES IN TELECOMP AND TELECOM: PARALLEL SIMULATION OF COMPLEX SYSTEMS AND LARGE-SCALE APPLICATIONS, 1996, : 141 - 150
  • [30] Analysis of large-scale distributed information systems
    Hellerstein, JL
    Jayram, TS
    Squillante, MS
    8TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, PROCEEDINGS, 2000, : 164 - 171