ACURDION: An Adaptive Clustering-based Algorithm for Tracing Large-scale MPI Applications

被引:0
|
作者
Bahmani, Amir
Mueller, Frank
机构
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Communication traces help developers of high-performance computing (HPC) applications understand and improve their codes. When run on large-scale HPC facilities, the scalability of tracing tools becomes a challenge. To address this problem, traces can be clustered into groups of processes that exhibit similar behavior. Instead of collecting traces information of each individual node, it then suffices to collect a trace of a small set of representative nodes, namely one per cluster. However, clustering algorithms themselves need to have low overhead, be scalable, and adapt to application characteristics. We devised an adaptive clustering algorithm for large-scale applications called ACURDION that traces the MPI communication of code with O(log P) time complexity where P is the number of processes. First, ACURDION identifies the parameters that differ across processes by using a logarithmic algorithm called Adaptive Signature Building. Second, it clusters the processes based on those parameters. Experiments show that collecting traces of just nine nodes/clusters suffices to capture the communication behavior of all nodes while retaining sufficient accuracy of trace events and parameters. In summary, ACURDION improves trace scalability and automation over prior approaches.
引用
收藏
页码:785 / 792
页数:8
相关论文
共 50 条
  • [31] A distributed and incremental algorithm for large-scale graph clustering
    Inoubli, Wissem
    Aridhi, Sabeur
    Mezni, Haithem
    Maddouri, Mondher
    Nguifo, Engelbert Mephu
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 134 : 334 - 347
  • [32] A MPI-Based Parallel Pyramid Building Algorithm for Large-Scale Remote Sensing Images
    He, Gaojin
    Xiong, Wei
    Chen, Luo
    Wu, Qiuyun
    Jing, Ning
    2015 23RD INTERNATIONAL CONFERENCE ON GEOINFORMATICS, 2015,
  • [33] Fast algorithm for large-scale subspace clustering by LRR
    Xie, Deyan
    Nie, Feiping
    Gao, Quanxue
    Xiao, Song
    IET IMAGE PROCESSING, 2020, 14 (08) : 1475 - 1480
  • [34] A Novel Clustering Algorithm on Large-Scale Graph Data
    Zhang, Hao
    Zhou, Wei
    Wan, Xiaoyu
    Fu, Ge
    Xu, Zhiyong
    Han, Jizhong
    2014 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2014, : 47 - 54
  • [35] A Novel Clustering Algorithm for Large-Scale Graph Processing
    Qu, Zhaoyang
    Ding, Wei
    Qu, Nan
    Yan, Jia
    Wang, Ling
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2016, PT III, 2016, 9773 : 349 - 358
  • [36] A distributed clustering algorithm for large-scale dynamic networks
    Bernard, Thibault
    Bui, Alain
    Pilard, Laurence
    Sohier, Devan
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2012, 15 (04): : 335 - 350
  • [37] A distributed clustering algorithm for large-scale dynamic networks
    Thibault Bernard
    Alain Bui
    Laurence Pilard
    Devan Sohier
    Cluster Computing, 2012, 15 : 335 - 350
  • [38] A fast fuzzy clustering algorithm for large-scale datasets
    Shi, LK
    He, PL
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 203 - 208
  • [39] A preference clustering protocol for large-scale multicast applications
    Wong, T
    Katz, R
    McCanne, S
    NETWORKED GROUP COMMUNICATION, PROCEEDINGS, 1999, 1736 : 1 - 18
  • [40] PDBSCAN: Parallel DBSCAN for Large-Scale Clustering Applications
    谢永红
    马延辉
    周芳
    刘颖安
    JournalofDonghuaUniversity(EnglishEdition), 2012, 29 (01) : 76 - 79