Faster Multiplication in Z2m [x] on Cortex-M4 to Speed up NIST PQC Candidates

被引:15
|
作者
Kannwischer, Matthias J. [1 ]
Rijneveld, Joost [1 ]
Schwabe, Peter [1 ]
机构
[1] Radboud Univ Nijmegen, Nijmegen, Netherlands
关键词
ARM Cortex-M4; Karatsuba; Toom; Lattice-based KEMs; NTRU;
D O I
10.1007/978-3-030-21568-2_14
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we optimize multiplication of polynomials in Z(2)m[x] on the ARM Cortex-M4 microprocessor. We use these optimized multiplication routines to speed up the NIST post-quantum candidates RLizard, NTRU-HRSS, NTRUEncrypt, Saber, and Kindi. For most of those schemes the only previous implementation that executes on the CortexM4 is the reference implementation submitted to NIST; for some of those schemes our optimized software is more than factor of 20 faster. One of the schemes, namely Saber, has been optimized on the Cortex-M4 in a CHES 2018 paper; the multiplication routine for Saber we present here outperforms the multiplication from that paper by 42%, yielding speedups of 22% for key generation, 20% for encapsulation and 22% for decapsulation. Out of the five schemes optimized in this paper, the best performance for encapsulation and decapsulation is achieved by NTRU-HRSS. Specifically, encapsulation takes just over 400 000 cycles, which is more than twice as fast as for any other NIST candidate that has previously been optimized on the ARM Cortex-M4.
引用
收藏
页码:281 / 301
页数:21
相关论文
共 50 条
  • [1] Faster NTRU on ARM Cortex-M4 With TMVP-Based Multiplication
    Paksoy, Irem Keskinkurt
    Cenk, Murat
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (10) : 4083 - 4092
  • [2] Faster Kyber and Dilithium on the Cortex-M4
    Abdulrahman, Amin
    Hwang, Vincent
    Kannwischer, Matthias J.
    Sprenkels, Daan
    APPLIED CRYPTOGRAPHY AND NETWORK SECURITY, ACNS 2022, 2022, 13269 : 853 - 871
  • [3] SIKE Round 2 Speed Record on ARM Cortex-M4
    Seo, Hwajeong
    Jalali, Amir
    Azarderakhsh, Reza
    CRYPTOLOGY AND NETWORK SECURITY (CANS 2019), 2019, 11829 : 39 - 60
  • [4] Polynomial multiplication in NTRU prime: Comparison of optimization strategies on Cortex-M4
    Alkim E.
    Cheng D.Y.-L.
    Chung C.-M.M.
    Evkan H.
    Huang L.W.-L.
    Hwang V.
    Li C.-L.T.
    Niederhagen R.
    Shih C.-J.
    Wälde J.
    Yang B.-Y.
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021, 2021 (01): : 217 - 238
  • [5] NTT multiplication for NTT-unfriendly rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2
    Chung, Chi-Ming Marvin
    Hwang, Vincent
    Kannwischer, Matthias J.
    Seiler, Gregor
    Shih, Cheng-Jhih
    Yang, Bo-Yin
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021, 2021 (02): : 159 - 188
  • [6] Memory Efficient Implementation of Modular Multiplication for 32-bit ARM Cortex-M4
    Seo, Hwajeong
    APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [7] Blocks with defect group Z2n x Z2n x Z2m
    Wu, Chao
    Zhang, Kun
    Zhou, Yuanyang
    JOURNAL OF ALGEBRA, 2018, 510 : 469 - 498
  • [8] The k-nacci sequences in Q2n xφ Z2m
    Deveci, Omur
    Karaduman, Erdal
    MATHEMATICAL AND COMPUTER MODELLING, 2012, 55 (3-4) : 1450 - 1455
  • [9] Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
    Beullens W.
    Campos F.
    Celi S.
    Hess B.
    Kannwischer M.J.
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024, 2024 (02): : 252 - 275
  • [10] Deforming D-brane models on T6/(Z2 x Z2M) orbifolds
    Koltermann, Isabel
    Blaszczyk, Michael
    Honecker, Gabriele
    FORTSCHRITTE DER PHYSIK-PROGRESS OF PHYSICS, 2016, 64 (4-5): : 412 - 413