SCALANA: Automating Scaling Loss Detection with Graph Analysis

被引:2
|
作者
Jin, Yuyang [1 ]
Wang, Haojie [1 ]
Yu, Teng [1 ]
Tang, Xiongchao [1 ]
Hoefler, Torsten [2 ]
Liu, Xu [3 ]
Zhai, Jidong [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] North Carolina State Univ, Raleigh, NC USA
基金
国家重点研发计划; 中国博士后科学基金; 中国国家自然科学基金; 北京市自然科学基金;
关键词
Performance Analysis; Scalability Bottleneck; Root-Cause Defection; Static Analysis; PERFORMANCE; COMPRESSION;
D O I
10.1109/SC41405.2020.00032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design SCALANA that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. SCALANA first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. NW evaluate SCALANA with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73% overhead on average for up to 2,048 processes. We achieve up to 11.11% performance improvement by fixing the root causes detected by SCALANA on 2,048 processes.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] iTurboGraph: Scaling and Automating Incremental Graph Analytics
    Ko, Seongyun
    Lee, Taesung
    Hong, Kijae
    Lee, Wonseok
    Seo, In
    Seo, Jiwon
    Han, Wook-Shin
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 977 - 990
  • [2] Graph scaling: A technique for automating program construction and deployment in ClusterGOP
    Chan, F
    Cao, JN
    Sun, YD
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2003, 2834 : 254 - 264
  • [3] Loose rock detection methods for automating the scaling process
    Radl, Alexandra
    Mitra, Rudrajit
    Clausen, Elisabeth
    MINING TECHNOLOGY-TRANSACTIONS OF THE INSTITUTIONS OF MINING AND METALLURGY, 2022, 131 (04) : 249 - 255
  • [4] Automating the expansion of a knowledge graph
    Yoo, SoYeop
    Jeong, OkRan
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 141
  • [5] Scaling Loss: Updating Gradient of Loss for Accurate Object Detection
    Hu, Jiahao
    He, Zihang
    Ye, Xiang
    Zhang, Gaoxin
    Li, Yong
    ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
  • [6] A Partitioning Approach to Scaling Anomaly Detection in Graph Streams
    Eberle, William
    Holder, Lawrence
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [7] Automating Anxiety Detection using Respiratory Signal Analysis
    Haritha, H.
    Negi, Swati
    Menon, Sarath R.
    Kumar, Anand A.
    Kumar, C. Santhosh
    2017 IEEE REGION 10 INTERNATIONAL SYMPOSIUM ON TECHNOLOGIES FOR SMART CITIES (IEEE TENSYMP 2017), 2017,
  • [8] Automating crack detection
    Jahanshahi, Mohammad R.
    NUCLEAR ENGINEERING INTERNATIONAL, 2017, 62 (753): : 24 - 24
  • [9] Scaling Graph Community Detection on the Tilera Many-core Architecture
    Chavarria-Miranda, Daniel
    Halappanavar, Mahantesh
    Kalyanaraman, Ananth
    2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [10] Automating tolerance charting using graph theory
    Britton, GA
    Cheong, FS
    Whybrew, K
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 885 - 890