ACURDION: An Adaptive Clustering-based Algorithm for Tracing Large-scale MPI Applications

被引:0
|
作者
Bahmani, Amir
Mueller, Frank
机构
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Communication traces help developers of high-performance computing (HPC) applications understand and improve their codes. When run on large-scale HPC facilities, the scalability of tracing tools becomes a challenge. To address this problem, traces can be clustered into groups of processes that exhibit similar behavior. Instead of collecting traces information of each individual node, it then suffices to collect a trace of a small set of representative nodes, namely one per cluster. However, clustering algorithms themselves need to have low overhead, be scalable, and adapt to application characteristics. We devised an adaptive clustering algorithm for large-scale applications called ACURDION that traces the MPI communication of code with O(log P) time complexity where P is the number of processes. First, ACURDION identifies the parameters that differ across processes by using a logarithmic algorithm called Adaptive Signature Building. Second, it clusters the processes based on those parameters. Experiments show that collecting traces of just nine nodes/clusters suffices to capture the communication behavior of all nodes while retaining sufficient accuracy of trace events and parameters. In summary, ACURDION improves trace scalability and automation over prior approaches.
引用
收藏
页码:785 / 792
页数:8
相关论文
共 50 条
  • [21] Fuzzy clustering algorithm based on multiple medoids for large-scale data
    Chen A.-G.
    Wang S.-T.
    Kongzhi yu Juece/Control and Decision, 2016, 31 (12): : 2122 - 2130
  • [22] CLUSTERING LARGE-SCALE DATA BASED ON MODIFIED AFFINITY PROPAGATION ALGORITHM
    Serdah, Ahmed M.
    Ashour, Wesam M.
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2016, 6 (01) : 23 - 33
  • [23] A clustering-based method for large-scale group decision making in the hesitant fuzzy set environment
    Yang, Han
    Xu, Gaili
    Wang, Feng
    Zhang, Yunfei
    COMPUTERS & INDUSTRIAL ENGINEERING, 2023, 183
  • [24] Large-scale QoS-aware service-oriented networking with a clustering-based approach
    Jin, Jingwen
    Liang, Jin
    Jin, Jingyi
    Nahrstedt, Klara
    PROCEEDINGS - 16TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, VOLS 1-3, 2007, : 522 - +
  • [25] A hierarchical clustering-based optimization strategy for active power dispatch of large-scale wind farm
    Lin, Zhongwei
    Chen, Zhenyu
    Qu, Chenzhi
    Guo, Yifei
    Liu, Jizhen
    Wu, Qiuwei
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2020, 121
  • [26] Clustering-Based Coordinated Control of Large-Scale Wind Farm for Power System Frequency Support
    Ma, Shaokang
    Geng, Hua
    Yang, Geng
    Pal, Bikash C.
    IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, 2018, 9 (04) : 1555 - 1564
  • [27] Constrained spectral clustering-based methodology for intentional controlled islanding of large-scale power systems
    Quiros-Tortos, Jairo
    Sanchez-Garcia, Ruben
    Brodzki, Jacek
    Bialek, Janusz
    Terzija, Vladimir
    IET Generation Transmission & Distribution, 2015, 9 (01) : 31 - 42
  • [28] Enhancing fault-tolerance of large-scale MPI scientific applications
    Rodriguez, G.
    Gonzalez, P.
    Martin, M. J.
    Tourino, J.
    PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2007, 4671 : 153 - 161
  • [29] An optimizing clustering algorithm for large-scale mobile network
    Tian, YC
    Guoi, W
    Ren, QC
    2002 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS AND WEST SINO EXPOSITION PROCEEDINGS, VOLS 1-4, 2002, : 155 - 159
  • [30] Algorithm for large-scale clustering across multiple genomes
    Yi, Gangman
    Jung, Jaehee
    BIOINFORMATION, 2011, 7 (05) : 251 - 255