Harnessing federated learning for anomaly detection in supercomputer nodes

被引:0
|
作者
Farooq, Emmen [1 ]
Milano, Michela [1 ]
Borghesi, Andrea [1 ]
机构
[1] Univ Bologna, DISI, Bologna, Italy
关键词
Federated learning; Anomaly detection; High-performance computing; Data center; Machine learning;
D O I
10.1016/j.future.2024.07.052
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
High-performance computing (HPC) systems are a crucial component of modern society, with a significant impact in areas ranging from economics to scientific research, thanks to their unrivaled computational capabilities. For this reason, the worldwide HPC installation is steeply trending upwards, with no sign of slowing down. However, these machines are both complex, comprising millions of heterogeneous components, hard to effectively manage, and very costly (both in terms of economic investment and of energy consumption). Therefore, maximizing their productivity is of paramount importance. For instance, anomalies and faults can generate significant downtime due to the difficulty of promptly detecting them, as there are potentially many sources of issues preventing the correct functioning of computing nodes. In recent years, several data-driven methods have been proposed to automatically detect anomalies in HPC systems, exploiting the fact that modern supercomputers are typically endowed with fine-grained monitoring infrastructures, collecting data that can be used to characterize the system behavior. Thus, it is possible to teach Machine Learning (ML) models to distinguish normal and anomalous states automatically. In this paper, we contribute to this line of research with a novel intuition, namely exploiting Federated Learning (FL) to improve the accuracy of anomaly detection models for HPC nodes. Although FL is not typically exploited in the HPC context, we show that FL can boost several types of underlying ML models, from supervised to unsupervised ones. We demonstrate our approach on a production Tier-0 supercomputer hosted in Italy. Applying FL to anomaly detection improves the average f-score from 0.46 to 0.87. Our research also shows FL can reduce the data collection time required to develop a representation data set, facilitating faster deployment of anomaly detection models. ML models need 5 months of training data for efficient anomaly detection performance while using FL reduces the training set by 15 times to 1.25 weeks.
引用
收藏
页码:673 / 685
页数:13
相关论文
共 50 条
  • [21] Identifying Backdoor Attacks in Federated Learning via Anomaly Detection
    Mi, Yuxi
    Sun, Yiheng
    Guan, Jihong
    Zhou, Shuigeng
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 111 - 126
  • [22] Federated Variational Learning for Anomaly Detection in Multivariate Time Series
    Zhang, Kai
    Jiang, Yushan
    Seversky, Lee
    Xu, Chengtao
    Liu, Dahai
    Song, Houbing
    2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [23] Federated Learning with Anomaly Client Detection and Decentralized Parameter Aggregation
    Shu Liu
    Shang, Yanlei
    52ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOP VOLUME (DSN-W 2022), 2022, : 37 - 43
  • [24] Personalized federated learning framework for network traffic anomaly detection
    Pei, Jiaming
    Zhong, Kaiyang
    Jan, Mian Ahmad
    Li, Jinhai
    COMPUTER NETWORKS, 2022, 209
  • [25] POSTER: Decentralized Federated Learning for Internet of Things Anomaly Detection
    Lian, Zhuotao
    Su, Chunhua
    ASIA CCS'22: PROCEEDINGS OF THE 2022 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2022, : 1249 - 1251
  • [26] Trust-based federated learning for network anomaly detection
    Chen, Naiyue
    Jin, Yi
    Li, Yinglong
    Cai, Luxin
    WEB INTELLIGENCE, 2021, 19 (04) : 317 - 327
  • [27] Federated disentangled representation learning for unsupervised brain anomaly detection
    Bercea, Cosmin, I
    Wiestler, Benedikt
    Rueckert, Daniel
    Albarqouni, Shadi
    NATURE MACHINE INTELLIGENCE, 2022, 4 (08) : 685 - +
  • [28] Chained Anomaly Detection Models for Federated Learning: An Intrusion Detection Case Study
    Preuveneers, Davy
    Rimmer, Vera
    Tsingenopoulos, Ilias
    Spooren, Jan
    Joosen, Wouter
    Ilie-Zudor, Elisabeth
    APPLIED SCIENCES-BASEL, 2018, 8 (12):
  • [29] Blockchain Enabled Federated Learning for Detection of Malicious Internet of Things Nodes
    Alami, Rachid
    Biswas, Anjanava
    Shinde, Varun
    Almogren, Ahmad
    Rehman, Ateeq Ur
    Shaikh, Tahseen
    IEEE ACCESS, 2024, 12 : 188174 - 188185
  • [30] Anomaly Detection for 5G Softwarized Infrastructures with Federated Learning
    Bin Ruba, Salah
    Yellas, Nour El-Houda
    Secci, Stefano
    2022 1ST INTERNATIONAL CONFERENCE ON 6G NETWORKING (6GNET), 2022,