Signal Processing Based Method for Real-Time Anomaly Detection in High-Performance Computing

被引:1
|
作者
Dey, ArwIavo [1 ]
Islam, Tanzima [1 ]
Phelps, Chase [1 ]
Kelly, Christopher [2 ]
机构
[1] Texas State Univ, Dept Comp Sci, San Marcos, TX 78666 USA
[2] Brookhaven Natl Lab, Comp Sci Initiat, Long Isl City, NY USA
关键词
Real-time anomaly detection in HPC; Signal based anomaly detection; Fast Fourier Transform; CHIMBUKO;
D O I
10.1109/COMPSAC57700.2023.00037
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Performance anomalies can manifest as irregular execution times or abnormal execution events for many reasons, including network congestion and resource contention. Detecting such anomalies in real-time by analyzing the details of performance traces at scale is impractical due to the sheer volume of data High-Performance Computing (HPC) applications produce. In this paper, we propose formulating HPC performance anomaly detection as a signal-processing problem where anomalies can be treated as noise. We evaluate our proposed method in comparison with two other commonly used anomaly detection techniques of varying complexity based on their detection accuracy and scalability. Since real-time in-situ anomaly detection at a large scale requires lightweight methods that can handle a large volume of streaming data, we find that our proposed method provides the best trade-off. We then implement the proposed method in CHIMBUKO, the first online, distributed, and scalable workflow-level performance trace analysis framework. We compare our proposed signal-based anomaly detection algorithm with two other methods using a function of their accuracy, F1 score, and detection overhead. Our experiments demonstrate that our proposed approach achieves a 99% improvement for the benchmark datasets and a 93% improvement with CHIMBUKO traces.
引用
收藏
页码:233 / 240
页数:8
相关论文
共 50 条
  • [21] A hybrid distributed optimistic concurrency control method for high-performance real-time transaction processing
    Qin, B
    Liu, YS
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (01) : 77 - 83
  • [22] A hybrid distributed optimistic concurrency control method for high-performance real-time transaction processing
    Biao Qin
    Yunsheng Liu
    Journal of Computer Science and Technology, 2003, 18 : 77 - 83
  • [23] DESIGN of a spaceborne high-performance and real-time image processing platform
    Pan Zheng
    Feng Xingtai
    Peng Chengxiang
    INTERNATIONAL CONFERENCE ON OPTICAL AND PHOTONIC ENGINEERING, ICOPEN 2022, 2022, 12550
  • [24] DESIGN of a spaceborne high-performance and real-time image processing platform
    Pan Zheng
    Feng Xingtai
    Peng Chengxiang
    AOPC 2022: OPTICAL SENSING, IMAGING, AND DISPLAY TECHNOLOGY, 2022, 12557
  • [25] A Real-Time Defect Detection Method for Digital Signal Processing of Inspection Applications
    Gao, Ying
    Lin, Jiqiang
    Xie, Jie
    Ning, Zhaolong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (05) : 3450 - 3459
  • [26] Real-time big data processing for anomaly detection: A Survey
    Habeeb, Riyaz Ahamed Ariyaluran
    Nasaruddin, Fariza
    Gani, Abdullah
    Hashem, Ibrahim Abaker Targio
    Ahmed, Ejaz
    Imran, Muhammad
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2019, 45 : 289 - 307
  • [27] Real-Time Causal Processing of Anomaly Detection for Hyperspectral Imagery
    Chen, Shih-Yu
    Wang, Yulei
    Wu, Chao-Cheng
    Liu, Chunhong
    Chang, Chein-I
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2014, 50 (02) : 1510 - 1533
  • [28] Real-time pneumonia prediction using pipelined spark and high-performance computing
    Ravikumar, Aswathy
    Sriraman, Harini
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [29] Real-time pneumonia prediction using pipelined spark and high-performance computing
    Ravikumar A.
    Sriraman H.
    PeerJ Computer Science, 2023, 9 : 1 - 23
  • [30] HiperView: real-time monitoring of dynamic behaviors of high-performance computing centers
    Tommy Dang
    Ngan Nguyen
    Yong Chen
    The Journal of Supercomputing, 2021, 77 : 11807 - 11826