A Low-Cost Floating-Point Dot-Product-Dual-Accumulate Architecture for HPC-Enabled AI

被引:2
|
作者
Tan, Hongbing [1 ]
Huang, Libo [1 ]
Zheng, Zhong [1 ]
Guo, Hui [1 ]
Yang, Qianmin [1 ]
Shen, Li [1 ]
Chen, Gang [2 ]
Xiao, Liquan [1 ]
Xiao, Nong
机构
[1] Natl Univ Def Technol, Coll Comp Sci & Technol, Changsha 410073, Peoples R China
[2] Sun Yat sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China
关键词
Dot-product-dual-accumulate (DPDAC); fused multiply-add; high-performance computing (HPC)-enabled artificial intelligence (AI); mixed-precision; numerical precision conversion; transprecision computing; FUSED-MULTIPLY-ADD; PERFORMANCE;
D O I
10.1109/TCAD.2023.3316994
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The dot-product Sigma( N)(i=1) A(i) x B-i is one of the most frequently used operations for a wide variety of high-performance computing (HPC) and artificial intelligence (AI) applications. However, for large-scale algorithms, such as acrshort GEMM and acrshort FFT, independent additions are necessary to accumulate the results of length-limited dot-product in order to form the final result, thus increasing latency and overhead. Hence, we proposed a dot-product-dual-accumulate (DPDAC) architecture capable of performing (Sigma( N=1,2,4 )(i=1)A(i) x B-i + Sigma C-M=1,2 (j=1)j) on a wide range of formats. The proposed architecture supports both single-path and dual-path execution. The single path is designed for performing acrshort DP acrshort FMA or DPDAC of lower formats, while dual-path supports parallel operations for single-precision (SP) addition and 2-term SP or acrshort TF32 dot-product or 4-term acrshort HP or BF16 dot-product. Moreover, numerical precision conversion is also supported by the proposed architecture, allowing for the conversion of numbers to higher or lower formats. The proposed DPDAC has been demonstrated to significantly reduce the overhead in comparison to discrete designs that utilize multiple single-mode acrshort FP units to achieve the same functionalities. Furthermore, when compared to the state-of-the-art multiple-precision designs, the proposed architecture has been shown to support a wide range of formats and a greater variety of operations with lower costs.
引用
收藏
页码:681 / 693
页数:13
相关论文
共 13 条
  • [1] Dual-Path Architecture of Floating-Point Dot Product Computation
    Yao Tao
    An Jianfeng
    Gao Deyuan
    Fan Xiaoya
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 2272 - 2276
  • [2] A Low-Cost Floating-Point FMA Unit Supporting Package Operations for HPC-AI Applications
    Tan, Hongbing
    Zhang, Jing
    He, Xiaowei
    Huang, Libo
    Wang, Yongwen
    Xiao, Liquan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (07) : 3488 - 3492
  • [3] Exact Dot Product Accumulate Operators for 8-bit Floating-Point Deep Learning
    Desrentes, Oregane
    de Dinechin, Benoit Dupont
    Le Maire, Julien
    2023 26TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN, DSD 2023, 2023, : 642 - 649
  • [4] Low-Cost High-Precision Architecture for Arbitrary Floating-Point Nth Root Computation
    Hong, Wanyuan
    Chen, Hui
    Quan, Lianghua
    Fu, Yuxiang
    Li, Li
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [5] A Low-Cost High Radix Floating-Point Square-Root Circuit
    Yang, Yuheng
    Yuan, Qing
    Liu, Jian
    ELECTRONICS, 2021, 10 (16)
  • [6] Low-Cost Concurrent Error Detection for Floating-Point Unit (FPU) Controllers
    Maniatakos, Michail
    Kudva, Prabhakar
    Fleischer, Bruce M.
    Makris, Yiorgos
    IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (07) : 1376 - 1388
  • [7] An FPGA-based low-cost VLIW floating-point processor for CNC applications
    Dong, Jingchuan
    Wang, Taiyong
    Li, Bo
    Liu, Zhe
    Yu, Zhigiang
    MICROPROCESSORS AND MICROSYSTEMS, 2017, 50 : 14 - 25
  • [8] LOCOFloat: A Low-Cost Floating-Point Format for FPGAs.: Application to HIL Simulators
    Sanchez, Alberto
    de Castro, Angel
    Sofia Martinez-Garcia, Maria
    Garrido, Javier
    ELECTRONICS, 2020, 9 (01)
  • [9] A Low Complexity Floating-Point Complex Multiplier with a Three-term Dot-Product Unit
    Yun, Sangho
    Sobelman, Gerald. E.
    Zhou, Xiaofang
    2014 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2014, : 549 - 552
  • [10] Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support
    Huang, Libo
    Ma, Sheng
    Shen, Li
    Wang, Zhiying
    Xiao, Nong
    IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (05) : 745 - 751