A Low Latency Floating-Point Sine and Cosine Function Hardware Implementation Algorithm

被引:0
|
作者
Liang F. [1 ]
Liu C. [2 ]
Li X. [1 ]
Qiu G. [2 ]
Zhang J. [2 ]
Chen Z. [2 ]
Li W. [2 ]
Cao Q. [1 ]
Lei S. [1 ]
机构
[1] School of Microelectronics, Xi'an Jiaotong University, Xi'an
[2] The 58th Research Institute of China Electronics Technology Group Corporation, Wuxi
关键词
Coordinate rotation digital computer algorithm; Hardware implementation; Sine and cosine function; Taylor first-order approximation;
D O I
10.7652/xjtuxb202111012
中图分类号
学科分类号
摘要
In order to solve the problem that coordinate rotation digital computer(CORDIC)algorithm is difficult to meet the requirement of low latency and high accuracy in hardware implementation, a low latency floating-point sine and cosine function hardware implementation algorithm is proposed. This algorithm divides the range of the input floating-point number's exponent into 3 regions of [-126, -16], [-15, 21] and [22, 126] based on the mathematical properties of sine and cosine. In the three regions, Taylor 0-order approximation, Taylor 1st-order approximation, and direct calculation methods are used, respectively. Taylor 1st-order approximation method is optimized by converting to fixed-point, induction formula reduction, and precomputing output exponent to reduce calculation complexity and improve parallelism. A 4-stage pipeline hardware implementation of the optimized Taylor 1st-order approximation is provided. By the way of traversal test, the accuracy of the proposed algorithm is only 1 ulp (unit in the last place) away from that of the C-language math library at most. In UMC55nm process, the circuit can reach 250 MHz clock frequency and the total latency per operation is only 4 clock cycle, which means the property of low latency. © 2021, Editorial Office of Journal of Xi'an Jiaotong University. All right reserved.
引用
收藏
页码:106 / 114
页数:8
相关论文
共 21 条
  • [1] VOLDER J E., The CORDIC trigonometric computing technique, IRE Transactions on Electronic Computers, EC-8, 3, pp. 330-334, (1959)
  • [2] JUANG T B, HSIAO S F, TSAI M Y., Para-CORDIC: parallel CORDIC rotation algorithm, IEEE Transactions on Circuits and Systems: I Regular Papers, 51, 8, pp. 1515-1524, (2004)
  • [3] CHEN L, LOMBARDI F, JIE H, Et al., A fully parallel approximate CORDIC design, Proceedings of the IEEE International Symposium on Nanoscale Architectures, pp. 197-202, (2016)
  • [4] SHUKLA R, RAY K C., Low latency hybrid CORDIC algorithm, IEEE Transactions on Computers, 63, 12, pp. 3066-3078, (2014)
  • [5] MAHARATNA K, BANERJEE S, GRASS E, Et al., Modified virtually scaling-free adaptive CORDIC rotator algorithm and architecture, IEEE Transactions on Circuits and Systems for Video Technology, 15, 11, pp. 1463-1474, (2005)
  • [6] MOROZ L, MYKYTIV T, HERASYM M., Improved scaling-free CORDIC algorithm, Proceedings of the East-West Design and Test Symposium, pp. 1-5, (2013)
  • [7] JAIME F J, SANCHEZ M A, HORMIGO J, Et al., Enhanced scaling-free CORDIC, IEEE Transactions on Circuits and Systems: I Regular Papers, 57, 7, pp. 1654-1662, (2010)
  • [8] HOU Nanxin, WANG Mingjiang, ZOU Xiafeng, Et al., A low latency floating point CORDIC algorithm for sin/cosine function, Proceedings of the 2019 4th IEEE International Conference on Signal and Image Processing, pp. 751-755, (2019)
  • [9] MAHARATNA K, EL-SHABRAWY K, AL-HASHIMI B., Reduced Z-datapath CORDIC rotator, Proceedings of the 2008 IEEE International Symposium on Circuits and Systems, pp. 3374-3377, (2008)
  • [10] GISUTHAN B, SRIKANTHAN T., Flat CORDIC: a unified architecture for high-speed generation of trigonometric and hyperbolic functions, Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems, pp. 1414-1417, (2000)