Efficient Low-Latency Hardware Architecture for Module-Lattice-Based Digital Signature Standard

被引:1
|
作者
Truong, Quang Dang [1 ]
Duong, Phap Ngoc [1 ,2 ]
Lee, Hanho [1 ]
机构
[1] Inha Univ, Dept Elect & Comp Engn, Incheon 22212, South Korea
[2] Univ Danang, Vietnam Korea Univ Informat & Commun Technol, Fac Comp Engn & Elect, Da Nang 50000, Vietnam
关键词
Computer architecture; Digital signatures; Standards; NIST; Arithmetic; Low latency communication; Quantum computing; Cryptography; Lattices; Public key cryptography; Field programmable gate arrays; Security management; Hardware security; Post-quantum cryptography (PQC); module-lattice-based digital signature standard (ML-DSA); crystals-Dilithium; lattice-based cryptography (LBC); number theoretic transform (NTT);
D O I
10.1109/ACCESS.2024.3370470
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid advancement of powerful quantum computers poses a significant security risk to current public-key cryptosystems, which heavily rely on the computational complexity of problems such as discrete logarithms and integer factorization. As a result, CRYSTALS-Dilithium, a lattice-based digital signature scheme with the potential to be an alternative algorithm that can withstand both quantum and classical attacks, has been standardized as ML-DSA after NIST Post-Quantum Cryptography competition. While prior studies have proposed hardware designs to accelerate this cryptosystem, there is room for further optimization in the tradeoff between performance and hardware consumption. This paper addresses these limitations by presenting an efficient low-latency hardware architecture for ML-DSA, leveraging optimized timing schedules for its three main algorithms. The hardware implementation enables runtime switching main operations in ML-DSA with various security levels. We design flexible arithmetic and hash modules tailored for ML-DSA, the most time-consuming submodules and key determinants of the scheme implementation. Combined with efficient operation scheduling to maximize the utilized time of submodules, our design achieves the best latency among FPGA-based implementations, outperforming stateof-the-art works by 1.27 similar to 2.58x in terms of the area-time tradeoff metric. Therefore, the proposed hardware architecture demonstrates its practical applicability for digital signature cryptosystems in post-quantum era.
引用
收藏
页码:32395 / 32407
页数:13
相关论文
共 50 条
  • [1] Hardware Efficient Low-Latency Architecture for High Throughput Rate Viterbi Decoders
    Cheng, Chao
    Parhi, Keshab K.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2008, 55 (12) : 1254 - 1258
  • [2] Low-Latency Hardware Architecture for Cipher-based Message Authentication Code
    Ben Dhaou, Imed
    Tuan Nguyen Gia
    Liljeberg, Pasi
    Tenhunen, Hannu
    2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2017, : 2206 - 2209
  • [3] Low-latency Hardware Architecture for VDF Evaluation in Class Groups
    Zhu, Danyang
    Tian, Jing
    Li, Minghao
    Wang, Zhongfeng
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (06) : 1706 - 1717
  • [4] An Efficient Low-Latency and High Throughput LED Cipher Architecture for IoT Security on a Hardware Platform
    Mahendra Shridhar Naik
    Desai Karanam Sreekantha
    Kanduri V. S. S. S. S. Sairam
    SN Computer Science, 5 (7)
  • [5] Low-latency hardware-efficient memory-based design for large-order FIR digital filters
    Meher, Pramod Kumar
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1254 - 1257
  • [6] Hardware Efficient and Low-Latency CA-SCL Decoder Based on Distributed Sorting
    Liang, Xiao
    Yang, Junmei
    Zhang, Chuan
    Song, Wenqing
    You, Xiaohu
    2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
  • [7] A Low-Latency and Low-Complexity Hardware Architecture for CTC Beam Search Decoding
    Lu, Siyuan
    Lu, Jinming
    Lin, Jun
    Wang, Zhongfeng
    Du, Li
    PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2019), 2019, : 352 - 357
  • [8] Energy-Efficient Low-Latency Signed Multiplier for FPGA-Based Hardware Accelerators
    Ullah, Salim
    Nguyen, Tuan Duy Anh
    Kumar, Akash
    IEEE EMBEDDED SYSTEMS LETTERS, 2021, 13 (02) : 41 - 44
  • [9] Radix-4 CORDIC algorithm based low-latency and hardware efficient VLSI architecture for Nth root and Nth power computations
    Ankur Changela
    Yogesh Kumar
    Marcin Woźniak
    Jana Shafi
    Muhammad Fazal Ijaz
    Scientific Reports, 13
  • [10] Radix-4 CORDIC algorithm based low-latency and hardware efficient VLSI architecture for Nth root and Nth power computations
    Changela, Ankur
    Kumar, Yogesh
    Wozniak, Marcin
    Shafi, Jana
    Ijaz, Muhammad Fazal
    SCIENTIFIC REPORTS, 2023, 13 (01)