Timeline for Is floating point math broken?

Current License: CC BY-SA 3.0

34 events
when toggle format what by license comment
Dec 29, 2021 at 21:05 comment added Peter Cordes Anyway, the default floating-point environment on standard IEEE-conforming systems supports subnormal numbers. Some, such as x86, support flushing denormals to exactly 0.0, because some CPUs handle gradual underflow by taking a microcode exception which is very slow, @Pacerier. Disabling that lets the normal fast-path always happen.
Dec 29, 2021 at 21:01 comment added Peter Cordes Allowing gradual underflow (non-zero numbers smaller than that, with less precision) does not hurt the precision of normal numbers. Maybe you meant to say that applications only depend on precision for normalized numbers, so hard underflow to +-0.0 instead of gradual underflow to subnormals doesn't hurt most programs? (i.e. the way gcc -ffast-math works on x86, setting setting flush-to-zero and denormals-are-zero. Or like 32-bit ARM NEON SIMD which doesn't support subnormals, @Pacerier). That doesn't make anything worse for non-tiny numbers; they still produce identical results.
Dec 29, 2021 at 20:56 comment added Peter Cordes The exponent range is separate from the mantissa width. 1 part in 2^53 is the relative precision of IEEE double, and thus epsilon (1 ulp of (double)1.0), but normalized floats have full precision for numbers down to 2^(-1022) ~= 2.2e-308, the smallest normalized float. en.wikipedia.org/wiki/….
S Apr 13, 2018 at 16:42 history suggested Vijay S CC BY-SA 3.0
Edit some corrections
Apr 13, 2018 at 9:07 review Suggested edits
S Apr 13, 2018 at 16:42
Apr 18, 2017 at 16:14 comment added KernelPanik @Pacerier, Normalized mode allows for accuracy down to machine epsilon for the mantissa or numeric part, which is 2^-53 for double precision. Since many applications depend upon accuracy until machine epsilon, normalized mode is typical (root finding comes to mind). However, it is very common to have support for denormalized mode today, like the JavaScript example on modern computers at the expense of decimal place accuracy. It's less common to support denormalized mode on many embedded systems.
Apr 17, 2017 at 19:13 comment added Pacerier @KernelPanik, Interval-arithmetic though, is surprisingly common. Eg putting items within a box, or drawing pixels within a rectangle [screen]. Also, why do you say that normalized mode is "typical"? And do you mean Javascript's mandatory denormalized mode support is atypical?
Jan 7, 2017 at 23:01 history edited Sneftel CC BY-SA 3.0
Bring section 4 in line with section 5
Dec 9, 2016 at 4:27 comment added Peter Cordes Anyway, this answer is correct that error-accumulation from repeated operations does happen. Also a good point about "what is an operation"; FMA changes things. It also has some interesting details about how FP division is implemented in hardware, even though as Stephen Canon pointed out those are irrelevant for accuracy. I think I'm going to have to downvote for confusingly implying that better than 0.5 ulp is possible. :/
Dec 9, 2016 at 4:24 comment added Peter Cordes This makes bit-exact deterministic computation possible across different hardware. But not in C; even without any -ffast-math options, the C language allows the compiler too much freedom. However, in asm you can run the same code on different implementations of x86 and get bit-exact results. See this answer about FP determinism in C vs. x86 asm.
Dec 9, 2016 at 4:21 comment added Peter Cordes A max error of 1/2 ulp for the "basic" operations (add, sub, mul, div, and sqrt) means the result is correctly rounded out to the last bit of the mantissa. This answer makes it sound like more accuracy would be possible if speed / power / die-area weren't a concern, but that's not the case. For any given inputs to an operation like ADD, there is exactly one result allowed by the IEEE standard. Hardware has to compute enough extra bits of the exact result to figure out what the correctly rounded result is, but has no choice in "how much error" to leave in the result of ADDSS for example.
Jun 9, 2016 at 17:32 comment added Solomon Slow @DigitalRoss, I read your answer. It explains why there is no binary floating point (BFP) number that represents the real number, 0.01. I don't think we disagree about the reality, only on how to describe it. You say the BFP representation of 0.01 is "inexact." I say it does not exist. I say that when you type the string "0.01" into your computer, the conversion function gives you an inexact result. My way of thinking probably is colored by work I've done in the past on low-level math libraries for machines that did not have floating point hardware.
Jun 9, 2016 at 16:58 comment added DigitalRoss @james large, yes, the numbers are what they are, but they aren't what you entered. Just like you cannot actually ever have 1/3 as a decimal fraction, you cannot ever by any mechanism have 0.01 as a binary fraction. You can't have 0.02 either. Never. When you get to 0.25, that's 0.01(2) and now you can have an exact fraction. The very nature of the base 10 radix prevents most decimal fractions from being representable in the IEEE format. I referenced my explanation. Did you read it?
Jun 9, 2016 at 15:21 comment added Solomon Slow @DigitalRoss, we seem to be using different lexicons. What do you mean when you say that a number is "inexact?" I know how an answer can be inexact, but if my program has a floating point number stored in some variable, x, how can that number be anything other than exactly what it is? Your answer speaks of "giving it input that is slightly off from what we wrote." But, if "giving it input" means converting a string of decimal digits into a floating point value in memory, then that's actually a lengthy sequence of floating point operations (at least two per digit.)
Jun 9, 2016 at 1:20 comment added DigitalRoss @james -- I disagree. It is the numbers, not the operations. The ops are exact. But very very few of the decimal fractions we can write have an exact equivalent in the binary radix of floating point fractions. See my (really late, you'll need to scroll way down) answer for a complete explanation.
Feb 1, 2016 at 15:33 comment added KernelPanik @Matt Sorry for the late response. It's basically due to resource/time issues and tradeoffs. There is a way to do long division/more 'normal' division, it's called SRT Division with radix two. However, this repeatedly shifts and subtracts the divisor from the dividend and takes many clock cycles since it only computes one bit of the quotient per clock cycle. We use tables of reciprocals so that we can compute more bits of the quotient per cycle and make effective performance/speed tradeoffs.
S Dec 6, 2015 at 14:18 history suggested Roy Shmuli CC BY-SA 3.0
Using code
Dec 6, 2015 at 13:44 review Suggested edits
S Dec 6, 2015 at 14:18
Feb 23, 2015 at 20:23 comment added Stephen Canon "The main cause of the error in floating point division, are the division algorithms used to calculate the quotient" is a very misleading thing to say. For an IEEE-754 conforming division, the only cause of error in floating-point division is the inability of the result to be exactly represented in the result format; the same result is computed regardless of the algorithm that is used.
Jun 11, 2014 at 10:58 comment added KernelPanik @james large Thanks for catching that. I edited the reply to clarify that most floating point operations have an error less than 1/2 of one ulp. There are some special cases where the result can be exact (like adding zero).
Jun 11, 2014 at 10:56 history edited KernelPanik CC BY-SA 3.0
Edited to clarify errors occur in most operations but not all
Jun 10, 2014 at 16:31 comment added Solomon Slow (1) Floating point numbers do not have error. Every floating point value is exactly what it is. Most (but not all) floating point operations give inexact results. For example, there is no binary floating point value that is exactly equal to 1.0/10.0. Some operations (e.g., 1.0 + 1.0) do give exact results on the other hand.
Apr 24, 2014 at 11:24 history edited KernelPanik CC BY-SA 3.0
deleted 38 characters in body
Apr 24, 2014 at 11:17 comment added KernelPanik @gnasher729 Good catch. Most basic operations also have en error of less than 1/2 of one unit in the last place using the default IEEE rounding mode. Edited the explanation, and also noted that the error may be greater than 1/2 of one ulp but less than 1 ulp if the user overrides the default rounding mode (this is especially true in embedded systems).
Apr 24, 2014 at 11:15 history edited KernelPanik CC BY-SA 3.0
Edited to correct for 1/2 of one ulp in most operations
Apr 23, 2014 at 22:31 comment added gnasher729 (3) is wrong. The rounding error in a division is not less than one unit in the last place, but at most half a unit in the last place.
May 14, 2013 at 11:29 history edited KernelPanik CC BY-SA 3.0
deleted 3 characters in body
Apr 25, 2013 at 11:10 history edited KernelPanik CC BY-SA 3.0
Clarified Radicies in Relation to Floating Point Division
Apr 18, 2013 at 20:50 history edited KernelPanik CC BY-SA 3.0
Apr 18, 2013 at 20:44 history edited KernelPanik CC BY-SA 3.0
Apr 18, 2013 at 17:36 history edited KernelPanik CC BY-SA 3.0
Apr 18, 2013 at 13:23 history edited KernelPanik CC BY-SA 3.0
deleted 16 characters in body
Apr 18, 2013 at 11:59 history edited KernelPanik CC BY-SA 3.0
added 768 characters in body
Apr 18, 2013 at 11:52 history answered KernelPanik CC BY-SA 3.0