FACT: Fast Communication Trace Collection for Parallel Applications through Program Slicing

被引:0
|
作者
Zhai, Jidong [1 ]
Sheng, Tianwei [1 ]
He, Jiangzhou [1 ]
Chen, Wenguang [1 ]
Zheng, Weimin [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
来源
PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS | 2009年
关键词
Communication Pattern; Communication Trace; Message Passing Program; Parallel Application;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A proper understanding of communication patterns of parallel applications is important to optimize application performance and design better communication subsystems. Communication patterns can be obtained by analyzing communication traces. However, existing approaches to generate communication traces need to execute the entire parallel applications on full-scale systems that are time-consuming and expensive. In this paper, we propose a novel technique, called FACT, which can perform FAst Communication Trace collection for large-scale parallel applications on small-scale systems. Our idea is to reduce the original program to obtain a program slice through static analysis, and to execute the program slice to acquire the communication traces. The program slice preserves all the variables and statements in the original program relevant to spatial and volume communication attributes. Our idea is based on an observation that most computation and message contents in message-passing parallel applications are independent of these attributes, and therefore can be removed from the programs for the purpose of communication trace collection. We have implemented FACT and evaluated it with NPB programs and Sweep3D. The results show that FACT can preserve the spatial and volume communication attributes of original programs and reduce resource consumptions by two orders of magnitude in most cases. For example, FACT collects the communication traces of the Sweep3D for 512 processes on a 4-node (32 cores) platform in just 6.79 seconds, consuming 1.25 GB memory, while the original program takes 256.63 seconds and consumes 213.83 GB memory on a 32-node (512 cores) platform. Finally, we present an application of FACT.
引用
收藏
页数:12
相关论文
共 4 条
  • [1] Fine-Grained Provenance Collection over Scripts Through Program Slicing
    Pimentel, Joao Felipe
    Freire, Juliana
    Murta, Leonardo
    Braganholo, Vanessa
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2016, 2016, 9672 : 199 - 203
  • [2] Logically Parallel Communication for Fast MPI plus Threads Applications
    Zambre, Rohit
    Sahasrabudhe, Damodar
    Zhou, Hui
    Berzins, Martin
    Chandramowlishwaran, Aparna
    Balaji, Pavan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (12) : 3038 - 3052
  • [3] A fast and accurate technique for mapping parallel applications on stream-oriented MPSoC platforms with communication awareness
    Ruggiero, Martino
    Guerri, Alessio
    Bertozzi, Davide
    Milano, Michela
    Benini, Luca
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2008, 36 (01) : 3 - 36
  • [4] A Fast and Accurate Technique for Mapping Parallel Applications on Stream-Oriented MPSoC Platforms with Communication Awareness
    Martino Ruggiero
    Alessio Guerri
    Davide Bertozzi
    Michela Milano
    Luca Benini
    International Journal of Parallel Programming, 2008, 36 : 3 - 36