Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions

被引：3

作者：

Edamatsu, Takuya ^{[1
]}

Takahashi, Daisuke ^{[2
]}

机构：

[1] Univ Tsukuba, Coll Informat Sci, Tsukuba, Ibaraki, Japan

[2] Univ Tsukuba, Ctr Computat Sci, Tsukuba, Ibaraki, Japan

来源：

IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS) | 2018年

关键词：

AVX-512; SIMD; Knights Landing; large integer multiplication; reduced-radix representation;

D O I：

10.1109/HPCC/SmartCity/DSS.2018.00059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose an implementation of large integer multiplication using Single Instruction Multiple Data (SIMD) instructions. We evaluated the implementation on an Intel Xeon Phi processor. The second generation Intel Xeon Phi processor, Knights Landing, has a set of Advanced Vector Extensions-512 (AVX-512) instructions. Using AVX-512, the processor can handle 512 bits at the same time and has the potential to multiply faster than a processor using Streaming SIMD Extensions (SSE) and AVX. Therefore, we applied AVX-512F (foundation) instructions to the program. In the multiplication of large integers, as the number of digits increases, various processing costs also become larger. One of these costs is carry processing. Therefore, we implemented a multiplication function using a reduced-radix representation and compared the execution time and the number of instructions against the GNU Multiple Precision Arithmetic Library (GMP). Furthermore, we used some optimization techniques for this kernel. We successfully achieved an execution time that was approximately 2.5x faster than GMP on the Knights Landing architecture.

引用

页码：211 / 218

页数：8

共 50 条

[41] Optimization of a sparse grid-based data mining kernel for architectures using AVX-512
Sarbu, Paul-Cristian
Bungartz, Hans-Joachim
2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 364 - 371
[42] Fused Table Scans: Combining AVX-512 and JIT to Double the Performance of Multi-Predicate Scans
Dreseler, Markus
Kossmann, Jan
Frohnhofen, Johannes
Uflacker, Matthias
Plattner, Hasso
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2018, : 102 - 109
[43] Conflict Detection-based Run-Length Encoding - AVX-512 CD Instruction Set in Action
Ungethuem, Annett
Pietrzyk, Johannes
Damme, Patrick
Habich, Dirk
Lehner, Wolfgang
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2018, : 96 - 101
[44] SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512
Regnault, Evann
Bramas, Berenger
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 21 (01) : 203 - 221
[45] Effective Implementation of Matrix-Vector Multiplication on Intel's AVX multicore Processor
Hassan, Somaia A.
Mahmoud, Mountasser M. M.
Hemeida, A. M.
Saber, Mahmoud A.
COMPUTER LANGUAGES SYSTEMS & STRUCTURES, 2018, 51 : 158 - 175
[46] LARGE INTEGER MULTIPLICATION ON HYPERCUBES
FAGIN, BS
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1992, 14 (04) : 426 - 430
[47] An Exploration of Using the Intel AVX2 Gather Load Instructions for Vectorised Image Processing
Cree, Michael J.
2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
[48] Evaluation of Large Integer Multiplication Methods on Hardware
Rafferty, Ciara
O'Neill, Maire
Hanley, Neil
IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (08) : 1369 - 1382
[49] Faster multiplication over F2[X] sing AVX512 instruction set and VPCLMULQDQ instruction
Robert, Jean-Marc
Veron, Pascal
JOURNAL OF CRYPTOGRAPHIC ENGINEERING, 2023, 13 (01) : 37 - 55
[50] Investigating Large Integer Arithmetic on Intel Xeon Phi SIMD Extensions
Keliris, Anastasis
Maniatakos, Michail
2014 9TH IEEE INTERNATIONAL CONFERENCE ON DESIGN & TECHNOLOGY OF INTEGRATED SYSTEMS IN NANOSCALE ERA (DTIS 2014), 2014,

← 1 2 3 4 5 →