SCALANA: Automating Scaling Loss Detection with Graph Analysis

被引:2
|
作者
Jin, Yuyang [1 ]
Wang, Haojie [1 ]
Yu, Teng [1 ]
Tang, Xiongchao [1 ]
Hoefler, Torsten [2 ]
Liu, Xu [3 ]
Zhai, Jidong [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] North Carolina State Univ, Raleigh, NC USA
基金
国家重点研发计划; 中国博士后科学基金; 中国国家自然科学基金; 北京市自然科学基金;
关键词
Performance Analysis; Scalability Bottleneck; Root-Cause Defection; Static Analysis; PERFORMANCE; COMPRESSION;
D O I
10.1109/SC41405.2020.00032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design SCALANA that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. SCALANA first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. NW evaluate SCALANA with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73% overhead on average for up to 2,048 processes. We achieve up to 11.11% performance improvement by fixing the root causes detected by SCALANA on 2,048 processes.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] GSDM: Graph-based scaling detection model in network function virtualization
    Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing, China
    不详
    Proc. - IEEE Glob. Commun. Conf., GLOBECOM, 2019,
  • [22] Automating the Detection of Linguistic Intergroup Bias Through Computerized Language Analysis
    Collins, Katherine A.
    Boyd, Ryan L.
    JOURNAL OF LANGUAGE AND SOCIAL PSYCHOLOGY, 2025,
  • [23] On the scaling of congestion in the Internet graph
    Akella, A
    Chawla, S
    Kannan, A
    Seshan, S
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2004, 34 (03) : 43 - 55
  • [24] Periodic orbit theory and the statistical analysis of scaling quantum graph spectra
    Dabaghian, Yu.
    PHYSICAL REVIEW E, 2007, 75 (05):
  • [25] Scaling Graph Computations at Facebook
    Ugander, Johan
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 499 - 499
  • [26] Automating technical analysis
    Yu, PLH
    Lam, K
    Ng, SH
    PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 1145 - 1148
  • [27] Automating nanoparticle analysis
    Rikers Y.G.
    Lemmens H.
    Metal Powder Report, 2021, 76 (05) : 10 - 13
  • [28] Computer network monitoring and abnormal event detection using graph matching and multidimensional scaling
    Bunke, H.
    Dickinson, P.
    Humm, A.
    Irniger, Ch.
    Kraetzl, M.
    ADVANCES IN DATA MINING: APPLICATIONS IN MEDICINE, WEB MINING, MARKETING, IMAGE AND SIGNAL MINING, 2006, 4065 : 576 - 590
  • [29] Graph Attention Network with Focal Loss for Seizure Detection on Electroencephalography Signals
    Zhao, Yanna
    Zhang, Gaobo
    Dong, Changxu
    Yuan, Qi
    Xu, Fangzhou
    Zheng, Yuanjie
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2021, 31 (07)
  • [30] Automating frame analysis
    Sanfilippo, Antonio
    Franklin, Lyndsey
    Tratz, Stephen
    Danielson, Gary
    Mileson, Nicholas
    Riensche, Roderick
    McGrath, Liam
    SOCIAL COMPUTING, BEHAVIORAL MODELING AND PREDICTION, 2008, : 239 - 248