SCALANA: Automating Scaling Loss Detection with Graph Analysis

被引:2
|
作者
Jin, Yuyang [1 ]
Wang, Haojie [1 ]
Yu, Teng [1 ]
Tang, Xiongchao [1 ]
Hoefler, Torsten [2 ]
Liu, Xu [3 ]
Zhai, Jidong [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] North Carolina State Univ, Raleigh, NC USA
基金
国家重点研发计划; 中国博士后科学基金; 中国国家自然科学基金; 北京市自然科学基金;
关键词
Performance Analysis; Scalability Bottleneck; Root-Cause Defection; Static Analysis; PERFORMANCE; COMPRESSION;
D O I
10.1109/SC41405.2020.00032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design SCALANA that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. SCALANA first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. NW evaluate SCALANA with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73% overhead on average for up to 2,048 processes. We achieve up to 11.11% performance improvement by fixing the root causes detected by SCALANA on 2,048 processes.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Automating Embryo Development Stage Detection in Time-Lapse Imaging with Synergic Loss and Temporal Learning
    Lockhart, Lisette
    Saeedi, Parvaneh
    Au, Jason
    Havelock, Jon
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT V, 2021, 12905 : 540 - 549
  • [32] AUTOMATING EEG ANALYSIS
    EISENBERG, L
    DRUG THERAPY, 1978, 8 (02) : 184 - 185
  • [33] AUTOMATING PROGRAM ANALYSIS
    HICKEY, T
    COHEN, J
    JOURNAL OF THE ACM, 1988, 35 (01) : 185 - 220
  • [34] Application of Fractional Scaling Analysis to Loss of Coolant Accidents, System Level Scaling for System Depressurization
    Wulff, Wolfgang
    Zuber, Novak
    Rohatgi, Upendra S.
    Catton, Ivan
    JOURNAL OF FLUIDS ENGINEERING-TRANSACTIONS OF THE ASME, 2009, 131 (08): : 0814021 - 08140213
  • [35] Botnet Detection using Social Graph Analysis
    Wang, Jing
    Paschalidis, Ioannis Ch
    2014 52ND ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2014, : 393 - 400
  • [36] Ship Detection for Automating Navigational Watch
    Matsumoto, Yohei
    2014 WORLD AUTOMATION CONGRESS (WAC): EMERGING TECHNOLOGIES FOR A NEW PARADIGM IN SYSTEM OF SYSTEMS ENGINEERING, 2014,
  • [37] Towards Automating the Detection of Event Sources
    Herzberg, Nico
    Khovalko, Oleh
    Baumgrass, Anne
    Weske, Mathias
    SERVICE-ORIENTED COMPUTING - ICSOC 2013 WORKSHOPS, 2014, 8377 : 111 - 122
  • [38] Automating detection of faults in TCP implementations
    Inamura, H
    Ishikawa, T
    Shigeno, H
    Takahashi, O
    Okada, K
    18TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1 (LONG PAPERS), PROCEEDINGS, 2004, : 315 - 320
  • [39] Fake News Detection Automating the process
    不详
    CURRENT SCIENCE, 2019, 117 (11): : 1773 - 1773
  • [40] Automating Snakes for Multiple Objects Detection
    Saha, Baidya Nath
    Ray, Nilanjan
    Zhang, Hong
    COMPUTER VISION - ACCV 2010, PT III, 2011, 6494 : 39 - 51