ACURDION: An Adaptive Clustering-based Algorithm for Tracing Large-scale MPI Applications

被引:0
|
作者
Bahmani, Amir
Mueller, Frank
机构
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Communication traces help developers of high-performance computing (HPC) applications understand and improve their codes. When run on large-scale HPC facilities, the scalability of tracing tools becomes a challenge. To address this problem, traces can be clustered into groups of processes that exhibit similar behavior. Instead of collecting traces information of each individual node, it then suffices to collect a trace of a small set of representative nodes, namely one per cluster. However, clustering algorithms themselves need to have low overhead, be scalable, and adapt to application characteristics. We devised an adaptive clustering algorithm for large-scale applications called ACURDION that traces the MPI communication of code with O(log P) time complexity where P is the number of processes. First, ACURDION identifies the parameters that differ across processes by using a logarithmic algorithm called Adaptive Signature Building. Second, it clusters the processes based on those parameters. Experiments show that collecting traces of just nine nodes/clusters suffices to capture the communication behavior of all nodes while retaining sufficient accuracy of trace events and parameters. In summary, ACURDION improves trace scalability and automation over prior approaches.
引用
收藏
页码:785 / 792
页数:8
相关论文
共 50 条
  • [1] Robust formulations for clustering-based large-scale classification
    Saketha Nath Jagarlapudi
    Aharon Ben-Tal
    Chiranjib Bhattacharyya
    Optimization and Engineering, 2013, 14 : 225 - 250
  • [2] A Clustering-Based Approach for Large-Scale Ontology Matching
    Algergawy, Alsayed
    Massmann, Sabine
    Rahm, Erhard
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2011, 6909 : 415 - 428
  • [3] Robust formulations for clustering-based large-scale classification
    Jagarlapudi, Saketha Nath
    Ben-Tal, Aharon
    Bhattacharyya, Chiranjib
    OPTIMIZATION AND ENGINEERING, 2013, 14 (02) : 225 - 250
  • [4] Adaptive clustering-based hierarchical layout optimisation for large-scale integrated energy systems
    Guo, Hui
    Shi, Tianling
    Wang, Fei
    Zhang, Lijun
    Lin, Zhengyu
    IET RENEWABLE POWER GENERATION, 2020, 14 (17) : 3336 - 3345
  • [5] Large-Scale Clustering using MPI-based Canopy
    Burys, Jacek
    Awan, Ahsan Javed
    Heinis, Thomas
    PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 77 - 84
  • [6] Fuzzy clustering-based large-scale multimodal multi-objective differential evolution algorithm
    Wu, Lingyu
    Zhao, Xinchao
    Ye, Lingjuan
    Qiao, Zenglin
    Zuo, Xingquan
    SWARM AND EVOLUTIONARY COMPUTATION, 2025, 93
  • [7] Fuzzy clustering-based skyline query preprocessing algorithm for large-scale flow data analysis
    Zeng, Yifu
    Zhou, Yantao
    Zhou, Xu
    Zheng, Fei
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (02): : 1321 - 1330
  • [8] A Decision Variable Clustering-Based Evolutionary Algorithm for Large-Scale Many-Objective Optimization
    Zhang, Xingyi
    Tian, Ye
    Cheng, Ran
    Jin, Yaochu
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2018, 22 (01) : 97 - 112
  • [9] An adaptive clustering algorithm by neighbourhood search for large-scale data
    Sevinc, Busra
    Gurler, Selma
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (01) : 175 - 187
  • [10] Adaptive Weighted Clustering Algorithm for Large-Scale Satellite Cluster Network
    Chen Y.
    Zhang Y.
    Chen S.
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2021, 41 (11): : 1188 - 1192