Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

被引：102

作者：

Iwasawa, Masaki ^{[1
]}

Tanikawa, Ataru ^{[1
,2
]}

Hosono, Natsuki ^{[1
]}

Nitadori, Keigo ^{[1
]}

Muranushi, Takayuki ^{[1
]}

Makino, Junichiro ^{[1
,3
,4
]}

机构：

[1] RIKEN Adv Inst Computat Sci, 7-1-26 Minatojima Minami Machi, Kobe, Hyogo 6500047, Japan

[2] Univ Tokyo, Dept Earth & Astron, Coll Arts & Sci, Meguro Ku, 3-8-1 Komaba, Tokyo 1538902, Japan

[3] Kobe Univ, Grad Sch Sci, Dept Planetol, Nada Ku, 1-1 Rokkodai Cho, Kobe, Hyogo 6578501, Japan

[4] Tokyo Inst Technol, Earth Life Sci Inst, Meguro Ku, 2-12-1 Ookayama, Tokyo 1528551, Japan

来源：

PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF JAPAN | 2016年 / 68卷 / 04期

关键词：

dark matter; Galaxy: evolution; methods: numerical; planets and satellites: formation; SPECIAL-PURPOSE COMPUTER; SIMD INSTRUCTION SET; N-BODY SIMULATION; TREE-CODE; HYDRODYNAMICS; DYNAMICS; GALAXIES; SYSTEMS; SPH;

D O I：

10.1093/pasj/psw053

中图分类号：

P1 [天文学];

学科分类号：

0704 ;

摘要：

We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N-2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 10(7)) to 300 ms (N = 10(9)). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.

引用

页数：22

共 50 条

[21] High Throughput Parallel Concatenated Encoding and Decoding for Polar Codes: Design, Implementation and Performance Analysis
Yin, Jiaying
Li, Lixin
Zhang, Huisheng
Li, Xu
Gao, Ang
Chen, Wei
Han, Zhu
2018 14TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2018, : 1373 - 1378
[22] An Implementation of Loop Fusion for Improving Performance and Energy Consumption of Shared-Memory Parallel Codes
Stirb, Iulia
2017 13TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2017, : 519 - 525
[23] Parallel List Decoding of Convolutional Codes: Algorithm and Implementation
Wang, Jian
Korb, Matthias
Zhang, Kangli
Kroll, Harald
Huang, Qiuting
Wei, Jibo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2017, 64 (10) : 2806 - 2817
[24] Parallel remeshing in tree codes for vortex particle methods
Speck, Robert
Krause, Rolf
Gibbon, Paul
APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 57 - 64
[25] Adaptive simulation: An implementation framework
Hall, R
Pham, B
Yearwood, J
SIMULATED EVOLUTION AND LEARNING, 1999, 1585 : 438 - 445
[26] PARALLEL COLLOCATION MODELING - A CASE-STUDY IN DEVELOPING EFFICIENT PARALLEL CODES
MILLIGAN, P
REA, SA
MCCONNELL, RK
WALTERS, HRJ
MICROPROCESSING AND MICROPROGRAMMING, 1992, 34 (1-5): : 77 - 80
[27] Parallel Particle-in-Cell Performance Optimization: A Case Study of Electrospray Simulation
Narayanan, Ramachandran K.
Madduri, Kamesh
2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1158 - 1167
[28] BENCHMARK TIMINGS WITH PARTICLE PLASMA SIMULATION CODES
DECYK, V
SUPERCOMPUTER, 1988, 5 (05): : 33 - 42
[29] Towards a complete framework for parallel implementation of logic languages: the data parallel implementation of SEL
Universita di Trento, Rovereto, Italy
Concurrency Pract Exper, 3 (191-204):
[30] Towards a complete framework for parallel implementation of logic languages: The data parallel implementation of SEL
Succi, G
Uhrik, C
CONCURRENCY-PRACTICE AND EXPERIENCE, 1996, 8 (03): : 191 - 204

← 1 2 3 4 5 →