Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions

被引：3

作者：

Edamatsu, Takuya ^{[1
]}

Takahashi, Daisuke ^{[2
]}

机构：

[1] Univ Tsukuba, Coll Informat Sci, Tsukuba, Ibaraki, Japan

[2] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan

来源：

IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年

关键词：

AVX-512; SIMD; Knights Landing; large integer multiplication; reduced-radix representation;

D O I：

10.1109/HPCC/SmartCity/DSS.2018.00059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose an implementation of large integer multiplication using Single Instruction Multiple Data (SIMD) instructions. We evaluated the implementation on an Intel Xeon Phi processor. The second generation Intel Xeon Phi processor, Knights Landing, has a set of Advanced Vector Extensions-512 (AVX-512) instructions. Using AVX-512, the processor can handle 512 bits at the same time and has the potential to multiply faster than a processor using Streaming SIMD Extensions (SSE) and AVX. Therefore, we applied AVX-512F (foundation) instructions to the program. In the multiplication of large integers, as the number of digits increases, various processing costs also become larger. One of these costs is carry processing. Therefore, we implemented a multiplication function using a reduced-radix representation and compared the execution time and the number of instructions against the GNU Multiple Precision Arithmetic Library (GMP). Furthermore, we used some optimization techniques for this kernel. We successfully achieved an execution time that was approximately 2.5x faster than GMP on the Knights Landing architecture.

引用

页码：211 / 218

页数：8

共 50 条

[31] Acceleration of Homomorphic Unrolled Trace-Type Function using AVX512 instructions
Inoue, Kotaro
Suzuki, Takuya
Yamana, Hayato
PROCEEDINGS OF THE 10TH WORKSHOP ON ENCRYPTED COMPUTING & APPLIED HOMOMORPHIC CRYPTOGRAPHY, WAHC 2022, 2022, : 47 - 52
[32] Batched Computation of the Singular Value Decompositions of Order Two by the AVX-512 Vectorization
Novakovic, Vedran
PARALLEL PROCESSING LETTERS, 2020, 30 (04)
[33] AVX-512 Based Software Decoding for 5G LDPC Codes
Xu, Yi
Wang, Wenjin
Xu, Then
Gao, Xiqi
PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 54 - 59
[34] Parallel Vectorized Algorithms for Computing Trigonometric Sums Using AVX-512 Extensions
Stpiczynski, Przemyslaw
COMPUTATIONAL SCIENCE, ICCS 2024, PT VI, 2024, 14937 : 158 - 172
[35] Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions
Bramas, Berenger
Kus, Pavel
PEERJ COMPUTER SCIENCE, 2018,
[36] Combining Algorithmic Rethinking and AVX-512 Intrinsics for Efficient Simulation of Subcellular Calcium Signaling
Jarvis, Chad
Lines, Glenn Terje
Langguth, Johannes
Nakajima, Kengo
Cai, Xing
COMPUTATIONAL SCIENCE - ICCS 2019, PT V, 2019, 11540 : 681 - 687
[37] Acceleration of Particle Swarm Optimization with AVX Instructions
Safarik, Jakub
Snasel, Vaclav
APPLIED SCIENCES-BASEL, 2023, 13 (02):
[38] Vectorization of High-performance Scientific Calculations Using AVX-512 Intruction Set
B. M. Shabanov
A. A. Rybakov
S. S. Shumilin
Lobachevskii Journal of Mathematics, 2019, 40 : 580 - 598
[39] Hydrogen-helium chemical and nuclear galaxy collision: Hydrodynamic simulations on AVX-512 supercomputers
Chernykh, Igor
Kulikov, Igor
Tutukov, Alexander
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2021, 391 (391)
[40] Vectorization of High-performance Scientific Calculations Using AVX-512 Intruction Set
Shabanov, B. M.
Rybakov, A. A.
Shumilin, S. S.
LOBACHEVSKII JOURNAL OF MATHEMATICS, 2019, 40 (05) : 580 - 598

← 1 2 3 4 5 →