An efficient parallel algorithm for O(N2) direct summation method and its variations on distributed-memory parallel machines

被引:27
|
作者
Makino, J [1 ]
机构
[1] Univ Tokyo, Sch Sci, Dept Astron, Bunkyo Ku, Tokyo 1130033, Japan
来源
NEW ASTRONOMY | 2002年 / 7卷 / 07期
基金
日本学术振兴会;
关键词
celestial mechanics; stellar dynamics; methods : numerical;
D O I
10.1016/S1384-1076(02)00143-4
中图分类号
P1 [天文学];
学科分类号
0704 ;
摘要
We present a novel, highly efficient algorithm to parallelize O(N-2) direct summation method for N-body problems with individual timesteps on distributed-memory parallel machines such as Beowulf clusters. Previously known algorithms, in which all processors have complete copies of the N-body system, has the serious problem that the communication-computation ratio increases as we increase the number of processors, since the communication cost is independent of the number of processors. In the new algorithm, p processors are organized as a rootp x rootp two-dimensional array. Each processor has N/rootp particles, but the data are distributed in such a way that complete system is presented if we look at any row or column consisting of rootp processors. In this algorithm, the communication cost scales as N/rootp, while the calculation cost scales as N-2/p. Thus, we can use a much larger number of processors without losing efficiency compared to what was practical with previously known algorithms. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:373 / 384
页数:12
相关论文
共 50 条
  • [41] An improved parallel algorithm for certain Toeplitz cyclic tridiagonal systems on distributed-memory multicomputer
    Zhang, XB
    Luo, ZG
    Li, XM
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2003, 2834 : 292 - 300
  • [42] A task scheduling algorithm to package messages on distributed memory parallel machines
    Fujimoto, N
    Baba, T
    Hashimoto, T
    Hagihara, K
    FOURTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS, AND NETWORKS (I-SPAN'99), PROCEEDINGS, 1999, : 236 - 241
  • [43] AN EXPERIMENTAL-STUDY OF PARALLEL BOLTZMANN MACHINE ON 2 DISTRIBUTED-MEMORY MULTIPROCESSORS
    NANG, JH
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1994, 30 (01): : 1 - 11
  • [44] Performance of parallel FDTD method for shared- and distributed-memory architectures: Application tobioelectromagnetics
    Ruiz-Cabello, Miguel N.
    Angulo, Luis M. Diaz
    Cobos Sanchez, Clemente
    Moglie, Franco
    Garcia, Salvador G.
    PLOS ONE, 2020, 15 (09):
  • [45] Interactive-rate animation generation by parallel progressive ray-tracing on distributed-memory machines
    Reisman, A
    Gotsman, C
    Schuster, A
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2000, 60 (09) : 1074 - 1102
  • [46] An efficient algorithm for parallel stiffness matrix assembling on shared memory machines
    Unterkircher, A
    Berkes, P
    Reissner, J
    SIMULATION OF MATERIALS PROCESSING: THEORY, METHODS AND APPLICATIONS, 2001, : 173 - 176
  • [47] Massively parallel implementation of a fast multipole method for distributed memory machines
    Kurzak, J
    Pettitt, BM
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2005, 65 (07) : 870 - 881
  • [48] Case study in parallel scientific computing: the boundary element method on a distributed-memory multicomputer
    IBM T.J. Watson Research Cent, Yorktown Heights, United States
    Eng Anal Boundary Elem, 3 (183-193):
  • [49] HIGH-PERFORMANCE SPECTRAL SIMULATION OF TURBULENT FLOWS IN MASSIVELY-PARALLEL MACHINES WITH DISTRIBUTED-MEMORY
    CORTESE, TA
    BALACHANDAR, S
    INTERNATIONAL JOURNAL OF SUPERCOMPUTER APPLICATIONS AND HIGH PERFORMANCE COMPUTING, 1995, 9 (03): : 187 - 204
  • [50] A Hybrid Parallel Delaunay Image-to-Mesh Conversion Algorithm Scalable on Distributed-Memory Clusters
    Feng, Daming
    Chernikov, Andrey N.
    Chrisochoides, Nikos P.
    25TH INTERNATIONAL MESHING ROUNDTABLE, 2016, 163 : 59 - 71