Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引:0
|
作者
Alireza Dehghanpour
Javad Khodamoradi Kordestani
Masoud Dehyadegari
机构
[1] K. N. Toosi University of Technology,Faculty of Computer Engineering
[2] Institute for Research in Fundamental Sciences (IPM),School of Computer Science
来源
Neural Processing Letters | 2023年 / 55卷
关键词
Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;
D O I
暂无
中图分类号
学科分类号
摘要
A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.
引用
收藏
页码:12061 / 12078
页数:17
相关论文
共 50 条
  • [21] ACCURATE FLOATING-POINT SUMMATION
    MALCOLM, MA
    COMMUNICATIONS OF THE ACM, 1971, 14 (11) : 731 - &
  • [22] QUANTIZATION ERRORS IN FLOATING-POINT ARITHMETIC
    SRIPAD, AB
    SNYDER, DL
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (05): : 456 - 463
  • [23] ARBITRARY PRECISION FLOATING-POINT ARITHMETIC
    MOTTELER, FC
    DR DOBBS JOURNAL, 1993, 18 (09): : 28 - &
  • [24] Fused Floating-Point Arithmetic for DSP
    Swartzlander, Earl E., Jr.
    Saleh, Hani H.
    2008 42ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-4, 2008, : 767 - +
  • [25] ACCURATE FLOATING-POINT SUMMATION
    LINZ, P
    COMMUNICATIONS OF THE ACM, 1970, 13 (06) : 361 - &
  • [26] Quantum Circuits for Floating-Point Arithmetic
    Haener, Thomas
    Soeken, Mathias
    Roetteler, Martin
    Svore, Krysta M.
    REVERSIBLE COMPUTATION, RC 2018, 2018, 11106 : 162 - 174
  • [27] MODIFIED FLOATING-POINT ARITHMETIC.
    Anon
    IBM technical disclosure bulletin, 1985, 28 (05): : 1836 - 1837
  • [28] Parameterised floating-point arithmetic on FPGAs
    Jaenicke, A
    Luk, W
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 897 - 900
  • [29] A Probabilistic Approach to Floating-Point Arithmetic
    Dahlqvist, Fredrik
    Salvia, Rocco
    Constantinides, George A.
    CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 596 - 602
  • [30] DSP TACKLES FLOATING-POINT ARITHMETIC
    FERRO, F
    COMPUTER DESIGN, 1986, 25 (15): : 53 - 56