ACURDION: An Adaptive Clustering-based Algorithm for Tracing Large-scale MPI Applications

被引:0
|
作者
Bahmani, Amir
Mueller, Frank
机构
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Communication traces help developers of high-performance computing (HPC) applications understand and improve their codes. When run on large-scale HPC facilities, the scalability of tracing tools becomes a challenge. To address this problem, traces can be clustered into groups of processes that exhibit similar behavior. Instead of collecting traces information of each individual node, it then suffices to collect a trace of a small set of representative nodes, namely one per cluster. However, clustering algorithms themselves need to have low overhead, be scalable, and adapt to application characteristics. We devised an adaptive clustering algorithm for large-scale applications called ACURDION that traces the MPI communication of code with O(log P) time complexity where P is the number of processes. First, ACURDION identifies the parameters that differ across processes by using a logarithmic algorithm called Adaptive Signature Building. Second, it clusters the processes based on those parameters. Experiments show that collecting traces of just nine nodes/clusters suffices to capture the communication behavior of all nodes while retaining sufficient accuracy of trace events and parameters. In summary, ACURDION improves trace scalability and automation over prior approaches.
引用
收藏
页码:785 / 792
页数:8
相关论文
共 50 条
  • [41] Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation
    Gallego, Antonio-Javier
    Calvo-Zaragoza, Jorge
    Valero-Mas, Jose J.
    Rico-Juan, Juan R.
    PATTERN RECOGNITION, 2018, 74 : 531 - 543
  • [42] CPCR: Contact-Prediction Clustering-Based Routing in Large-Scale Urban Delay Tolerant Networks
    Wu, Binglin
    Li, Yilin
    Zhang, Shuxi
    Wang, Haiquan
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 198 - 204
  • [43] A Sampling-Based Density Peaks Clustering Algorithm for Large-Scale Data
    Ding, Shifei
    Li, Chao
    Xu, Xiao
    Ding, Ling
    Zhang, Jian
    Guo, Lili
    Shi, Tianhao
    PATTERN RECOGNITION, 2023, 136
  • [44] MapReduce-based Dragonfly Algorithm for large-scale Data-Clustering
    Tripathi, Ashish Kumar
    Saxena, Pranav
    Gupta, Siddharth
    2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 171 - 175
  • [45] Large-Scale Data Clustering Algorithm Based on Quantum Immune Regulation Network
    Li, Yangyang
    Bai, Xiaoyu
    Hou, Xiaoju
    Jiao, Licheng
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017,
  • [46] A Synchronous Tiered Based Clustering Algorithm for large-scale Ad hoc Networks
    Jemili, Imen
    Belghith, Abdelfatteh
    Mosbah, Mohamed
    WIRELESS AND MOBILE NETWORKING, 2008, 284 : 41 - 55
  • [47] Affinity propagation clustering algorithm based on large-scale data-set
    Wang L.
    Zheng K.
    Tao X.
    Han X.
    International Journal of Computers and Applications, 2018, 40 (03) : 1 - 6
  • [48] A vector reconstruction based clustering algorithm particularly for large-scale text collection
    Liu, Ming
    Wu, Chong
    Chen, Lei
    NEURAL NETWORKS, 2015, 63 : 141 - 155
  • [49] Web portal for large-scale computations based on grid and MPI
    Kazakh National University, Mechanics and Mathematics Faculty, Computer Science Department, Masanchi street 39/47, Almaty
    050012, Kazakhstan
    不详
    050091, Kazakhstan
    Scalable Comput. Pract. Exp., 2008, 2 (135-142):
  • [50] WEB PORTAL FOR LARGE-SCALE COMPUTATIONS BASED ON GRID AND MPI
    Akzhalova, Assel Zh.
    Aizhulov, Daniar Y.
    Seralin, Galymzhan
    Balakayeva, Gulnar
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2008, 9 (02): : 135 - 142