SCALANA: Automating Scaling Loss Detection with Graph Analysis

被引:2
|
作者
Jin, Yuyang [1 ]
Wang, Haojie [1 ]
Yu, Teng [1 ]
Tang, Xiongchao [1 ]
Hoefler, Torsten [2 ]
Liu, Xu [3 ]
Zhai, Jidong [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] North Carolina State Univ, Raleigh, NC USA
基金
国家重点研发计划; 中国博士后科学基金; 中国国家自然科学基金; 北京市自然科学基金;
关键词
Performance Analysis; Scalability Bottleneck; Root-Cause Defection; Static Analysis; PERFORMANCE; COMPRESSION;
D O I
10.1109/SC41405.2020.00032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design SCALANA that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. SCALANA first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. NW evaluate SCALANA with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73% overhead on average for up to 2,048 processes. We achieve up to 11.11% performance improvement by fixing the root causes detected by SCALANA on 2,048 processes.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Datalography: Scaling Datalog Graph Analytics on Graph Processing Systems
    Moustafa, Walaa Eldin
    Papavasileiou, Vicky
    Yocum, Ken
    Deutsch, Alin
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 56 - 65
  • [42] Scaling Up Graph Neural Networks Via Graph Coarsening
    Huang, Zengfeng
    Zhang, Shengzhong
    Xi, Chong
    Liu, Tang
    Zhou, Min
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 675 - 684
  • [43] Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis
    Chen, Lisha
    Buja, Andreas
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (485) : 209 - 219
  • [44] Online Dynamic Voltage Scaling using task graph mapping analysis for multiprocessors
    Choudhury, Pravanjan
    Chakrabarti, P. P.
    Kumar, Rajeev
    20TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS: TECHNOLOGY CHALLENGES IN THE NANOELECTRONICS ERA, 2007, : 89 - +
  • [45] Scaling detection in time series: Diffusion entropy analysis
    Scafetta, N
    Grigolini, P
    PHYSICAL REVIEW E, 2002, 66 (03):
  • [46] Graph Neural Network based Scene Change Detection Using Scene Graph Embedding with Hybrid Classification Loss
    Kim, Soyeon
    Joo, Kyung-no
    Youn, Chan-Hyun
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 190 - 195
  • [47] Application of Fractional Scaling Analysis to Loss of Coolant Accidents: Component Level Scaling for Peak Clad Temperature
    Catton, Ivan
    Wulff, Wolfgang
    Zuber, Novak
    Rohatgi, Upendra
    JOURNAL OF FLUIDS ENGINEERING-TRANSACTIONS OF THE ASME, 2009, 131 (12): : 1214011 - 1214018
  • [48] GRAM: Scaling Graph Computation to the Trillions
    Wu, Ming
    Yang, Fan
    Xue, Jilong
    Xiao, Wencong
    Miao, Youshan
    Wei, Lan
    Lin, Haoxiang
    Dai, Yafei
    Zhou, Lidong
    ACM SoCC'15: Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015, : 408 - 421
  • [49] Capacity scaling for graph cuts in vision
    Juan, Olivier
    Boykov, Yuri
    2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 1118 - 1125
  • [50] Scaling Iterative Graph Computations with GraphMap
    Lee, Kisung
    Liu, Ling
    Schwan, Karsten
    Pu, Calton
    Zhang, Qi
    Zhou, Yang
    Yigitoglu, Emre
    Yuan, Pingpeng
    PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,