Accurate Low-Bit Length Floating-Point Arithmetic with Sorting Numbers

被引：0

作者：

Alireza Dehghanpour

Javad Khodamoradi Kordestani

Masoud Dehyadegari

机构：

[1] K. N. Toosi University of Technology,Faculty of Computer Engineering

[2] Institute for Research in Fundamental Sciences (IPM),School of Computer Science

来源：

Neural Processing Letters | 2023年 / 55卷

关键词：

Deep neural networks; Floating point; Sorting; AlexNet; Convolutional neural networks;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

A 32-bit floating-point format is often used for the development and training of deep neural networks. Training and inference in deep learning-optimized codecs can result in enormous performance and energy efficiency advantages. However, training and inferring low-bit neural networks still pose a significant challenge. In this study, we propose a sorting method that maintains accuracy in numerical formats with a low number of bits. We tested this method on convolutional neural networks, including AlexNet. Using our method, we found that in our convolutional neural network, the accuracy achieved with 11 bits matches that of the IEEE 32-bit format. Similarly, in AlexNet, the accuracy achieved with 10 bits matches that of the IEEE 32-bit format. These results suggest that the sorting method shows promise for calculations with limited accuracy.

引用

页码：12061 / 12078

页数：17

共 50 条

[21] ACCURATE FLOATING-POINT SUMMATION
MALCOLM, MA
COMMUNICATIONS OF THE ACM, 1971, 14 (11) : 731 - &
[22] QUANTIZATION ERRORS IN FLOATING-POINT ARITHMETIC
SRIPAD, AB
SNYDER, DL
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (05): : 456 - 463
[23] ARBITRARY PRECISION FLOATING-POINT ARITHMETIC
MOTTELER, FC
DR DOBBS JOURNAL, 1993, 18 (09): : 28 - &
[24] Fused Floating-Point Arithmetic for DSP
Swartzlander, Earl E., Jr.
Saleh, Hani H.
2008 42ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-4, 2008, : 767 - +
[25] ACCURATE FLOATING-POINT SUMMATION
LINZ, P
COMMUNICATIONS OF THE ACM, 1970, 13 (06) : 361 - &
[26] Quantum Circuits for Floating-Point Arithmetic
Haener, Thomas
Soeken, Mathias
Roetteler, Martin
Svore, Krysta M.
REVERSIBLE COMPUTATION, RC 2018, 2018, 11106 : 162 - 174
[27] MODIFIED FLOATING-POINT ARITHMETIC.
Anon
IBM technical disclosure bulletin, 1985, 28 (05): : 1836 - 1837
[28] Parameterised floating-point arithmetic on FPGAs
Jaenicke, A
Luk, W
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 897 - 900
[29] A Probabilistic Approach to Floating-Point Arithmetic
Dahlqvist, Fredrik
Salvia, Rocco
Constantinides, George A.
CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 596 - 602
[30] DSP TACKLES FLOATING-POINT ARITHMETIC
FERRO, F
COMPUTER DESIGN, 1986, 25 (15): : 53 - 56

← 1 2 3 4 5 →