Faithfully Rounded Floating-point Computations

被引:8
|
作者
Lange, Marko [1 ]
Rump, Siegfried M. [1 ,2 ]
机构
[1] Waseda Univ, Fac Sci & Engn, Shinjuku Ku, 3-4-1 Okubo, Tokyo 1698555, Japan
[2] Hamburg Univ Technol, Inst Reliable Comp, Schwarzenberg Campus 3, D-21071 Hamburg, Germany
来源
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE | 2020年 / 46卷 / 03期
基金
日本科学技术振兴机构;
关键词
Double-double; inaccurate cancellation; rigorous error bounds; ACCURATE;
D O I
10.1145/3290955
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a pair arithmetic for the four basic operations and square root. It can be regarded as a simplified, more-efficient double-double arithmetic. The central assumption on the underlying arithmetic is the first standard model for error analysis for operations on a discrete set of real numbers. Neither do we require a floating-point grid nor a rounding to nearest property. Based on that, we define a relative rounding error unit u and prove rigorous error bounds for the computed result of an arbitrary arithmetic expression depending on u, the size of the expression, and possibly a condition measure. In the second part of this note, we extend the error analysis by examining requirements to ensure faithfully rounded outputs and apply our results to IEEE 754 standard conform floating-point systems. For a class of mathematical expressions, using an IEEE 754 standard conform arithmetic with base beta, the result is proved to be faithfully rounded for up to 1/root beta u - 2 operations. Our findings cover a number of previously published algorithms to compute faithfully rounded results, among them Horner's scheme, products, sums, dot products, or Euclidean norm. Beyond that, several other problems can be analyzed, such as polynomial interpolation, orientation problems, Householder transformations, or the smallest singular value of Hilbert matrices of large size.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] FLOATING-POINT GEOMETRY: TOWARDS GUARANTEED GEOMETRIC COMPUTATIONS WITH APPROXIMATE ARITHMETICS?
    Bajard, Jean-Claude
    Langlois, Philippe
    Michelucci, Dominique
    Morin, Geraldine
    Revol, Nathalie
    ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS XVIII, 2008, 7074
  • [32] Algorithms for Stochastically Rounded Elementary Arithmetic Operations in IEEE 754 Floating-Point Arithmetic
    Fasi, Massimiliano
    Mikaitis, Mantas
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (03) : 1451 - 1466
  • [33] Algorithms for Stochastically Rounded Elementary Arithmetic Operations in IEEE 754 Floating-Point Arithmetic
    Fasi, Massimiliano
    Mikaitis, Mantas
    2021 IEEE 28TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH 2021), 2021, : 69 - 69
  • [34] Accurate Floating-point Operation using Controlled Floating-point Precision
    Zaki, Ahmad M.
    Bahaa-Eldin, Ayman M.
    El-Shafey, Mohamed H.
    Aly, Gamal M.
    2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 696 - 701
  • [35] Floating-point arithmetic
    Boldo, Sylvie
    Jeannerod, Claude-Pierre
    Melquiond, Guillaume
    Muller, Jean-Michel
    ACTA NUMERICA, 2023, 32 : 203 - 290
  • [36] On Using Floating-Point Computations to Help an Exact Linear Arithmetic Decision Procedure
    Monniaux, David
    COMPUTER AIDED VERIFICATION, PROCEEDINGS, 2009, 5643 : 570 - 583
  • [37] A Reflexive Tactic for Polynomial Positivity using Numerical Solvers and Floating-Point Computations
    Martin-Dorel, Erik
    Roux, Pierre
    PROCEEDINGS OF THE 6TH ACM SIGPLAN CONFERENCE ON CERTIFIED PROGRAMS AND PROOFS, CPP'17, 2017, : 90 - 99
  • [38] An algorithm for converting floating-point computations to fixed-point in MATLAB based FPGA design
    Roy, S
    Banerjee, P
    41ST DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2004, 2004, : 484 - 487
  • [39] On floating-point summation
    Espelid, TO
    SIAM REVIEW, 1995, 37 (04) : 603 - 607
  • [40] FLOATING-POINT COMPUTATION
    STERBENZ, P
    TRANSACTIONS OF THE NEW YORK ACADEMY OF SCIENCES, 1974, 36 (06): : 591 - 591