An integrated compile-time/run-time software distributed shared memory system

被引:14
|
作者
Dwarkadas, S
Cox, AL
Zwaenepoel, W
机构
关键词
D O I
10.1145/248209.237181
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
On a distributed memory machine, hand-coded message passing leads to the most efficient execution, but it is difficult to use. Parallelizing compilers can approach the performance of hand-coded message passing by translating data-parallel programs into message passing programs, but efficient execution is limited to those programs for which precise analysis can be carried out. Shared memory is easier to program than message passing and its domain is not constrained by the limitations of parallelizing compilers, but it lags in performance. Our goal is to close that performance gap while retaining the benefits of shared memory. In other words, our goal is (1) to make shared memory as efficient as message passing, whether hand-coded or compiler-generated, (2) to retain its ease of programming, and (3) to retain the broader class of applications it supports. To this end we have designed and implemented an integrated compile-time and run-time software DSM system. The programming model remains identical to the original pure run-time DSM system. No user intervention is required to obtain the benefits of our system. The compiler computes data access patterns for the individual processors. It then performs a source-to-source transformation, inserting in the program calls to inform the run-time system of the computed data access patterns. The run-time system uses this information to aggregate communication, to aggregate data and synchronization into a single message, to eliminate consistency overhead, and to replace global synchronization with point-to-point synchronization wherever possible. We extended the Parascope programming environment to perform the required analysis, and we augmented the TreadMarks run-time DSM Library to take advantage of the analysis. We used six Fortran programs to assess the performance benefits: Jacobi, 3D-FFT, Integer Sort, Shallow Gauss, and Modified Gramm-Schmidt, each with two different data set sizes. The experiments were run on an 8-node IBM SP/2 using user-space communication. Compiler optimization in conjunction with the augmented run-time system achieves substantial execution time improvements in comparison to the base TreadMarks, ranging from 4% to 59% on 8 processors. Relative to message passing implementations of the same applications, the compile-time runtime system is 0-29% slower than message passing, while the base run-time system is 5-212% slower. For the five programs that XHPF could parallelize (all except IS), the execution times achieved by the compiler optimized shared memory programs are within 9% of XHPF.
引用
收藏
页码:186 / 197
页数:12
相关论文
共 50 条
  • [41] Compile-time computation of polytime functions
    Covino, Emanuele
    Pani, Giovanni
    Scrimieri, Daniele
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2007, 13 (04) : 468 - 478
  • [42] Run-time monitoring of distributed applications
    Logean, X
    Dietrich, F
    Karamyan, H
    Koppenhöfer, S
    MIDDLEWARE'98: IFIP INTERNATIONAL CONFERENCE ON DISTRIBUTED SYSTEMS PLATFORMS AND OPEN DISTRIBUTED PROCESSING, 1998, : 459 - 474
  • [43] COMPILE-TIME PROGRAM RESTRUCTURING IN MULTIPROGRAMMED VIRTUAL MEMORY-SYSTEMS
    HARTLEY, SJ
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1988, 14 (11) : 1640 - 1644
  • [44] Reducing time cost of distributed Run-Time Infrastructure
    Zhou, Zhong
    Zhao, Qinping
    ADVANCES IN ARTIFICIAL REALITY AND TELE-EXISTENCE, PROCEEDINGS, 2006, 4282 : 969 - +
  • [45] Compile and Run-Time Support for the Parallelization of Sparse Matrix Updating Algorithms
    Gerardo Bandera
    Manuel Ujaldón
    Emilio L. Zapata
    The Journal of Supercomputing, 2000, 17 : 263 - 276
  • [46] SCHEMATIC: Compile-Time Checkpoint Placement and Memory Allocation for Intermittent Systems
    Reymond, Hugo
    Bechennec, Jean-Luc
    Briday, Mikael
    Faucou, Sebastien
    Puaut, Isabelle
    Rohou, Erven
    2024 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO, 2024, : 258 - 269
  • [47] RUN-TIME ISSUES IN PROGRAM PARTITIONING ON DISTRIBUTED-MEMORY SYSTEMS
    PANDE, S
    AGRAWAL, DP
    CONCURRENCY-PRACTICE AND EXPERIENCE, 1995, 7 (05): : 429 - 454
  • [48] Integrating run-time changes into system and software process enactment
    Hanh Nhi Tran
    Hajmoosaei, Mojtaba
    Percebois, Christian
    Front, Agnes
    Roncancio, Claudia
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2016, 28 (09) : 762 - 782
  • [49] Compile and run-time support for the parallelization of sparse matrix updating algorithms
    Bandera, G
    Ujaldón, M
    Zapata, EL
    JOURNAL OF SUPERCOMPUTING, 2000, 17 (03): : 263 - 276
  • [50] A run-time memory protection methodology
    Seshua, Udaya
    Bussa, Nagaraju
    Vermeulen, Bart
    PROCEEDINGS OF THE ASP-DAC 2007, 2007, : 498 - +