7

We know that floating point is broken, because decimal numbers can't always be perfectly represented in binary. They're rounded to a number that can be represented in binary; sometimes that number is higher, and sometimes it's lower. In this case using the ubiquitous IEEE 754 double format both 0.1 and 0.4 round higher:

0.1 = 0.1000000000000000055511151231257827021181583404541015625
0.4 = 0.40000000000000002220446049250313080847263336181640625

Since both of these numbers are high, you'd expect their sum to be high as well. Perfect addition should give you 0.5000000000000000277555756156289135105907917022705078125, but instead you get a nice exact 0.5. Why?


The question Is floating point math broken? was already identified above, but this question is different. It is asking for a further level of detail on a non-intuitive result when taking the answers of that question into consideration.

3
  • Related stackoverflow.com/questions/588004/… (but not duplicate) Jan 22, 2018 at 4:16
  • 9
    We know that floating point is broken, We (those of us who know, and I think you are in that class) can't know that because it isn't true. What is broken is the understanding that many programmers have of floating-point arithmetic. Since this seems to be your effort to provide a canonical Q&A I don't think it should start with that misleading statement. Jan 22, 2018 at 6:53
  • @HighPerformanceMark I needed a way to indicate this wasn't your typical floating-point accuracy question, and maybe I was a little over dramatic. And it wasn't intended to be canonical really, it's a genuine question that someone asked me and I struggled to come up with an answer. But as long as I had an answer I thought I would present it and let it slug it out with the others. Jan 22, 2018 at 14:24

2 Answers 2

5

This calculation behaves this way because the addition pushes the result into another (binary) order of magnitude. This adds a significant bit to the left (most-significant side) and therefore has to drop a bit on the right (least-significant side).

The first number, 0.1, is stored in binary as a number between 2^-4 == 1/16 and 2^-3 == 1/8. The second number, 0.4, is stored in binary as a number between 2^-2 == 1/4 and 2^-1 == 1/2. The sum, 0.5, is the number 2^-1 == 1/2 or a little larger. This is a mis-match in magnitudes and can cause loss of digits.

Here is an example, easier to understand. Let's say we are working on a decimal computer that can store four decimal digits in floating point. Let's also say we want to add the numbers 10/3 and 20/3. These may end up stored as

3.334

and

6.667

both of which are a little high. When we get those numbers, we expect the sum to be also a little high, namely

10.001

but notice that we have moved into a new order of magnitude with our result. The full result has five decimal digits, which will not fit. So the computer rounds the result to just four decimal digits and gets the sum

10.00

which surprisingly is the correct exact answer to 10/3 + 20/3.

I get the same kind of thing often in my U.S. high-school Chemistry and Physics classes. When a calculation moves to a new order of magnitude, strange things happen with precision and significant digits.

1
  • I like the base-10 example, it makes the case a little more accessible. I just hope it doesn't confuse as much as it illuminates. Jan 22, 2018 at 20:34
3

Although most decimal numbers need to be rounded to fit into binary, some don't. 0.5 can be exactly represented in binary, since it's 2-1.

Floating point isn't just binary, it also has limited precision. Here is the exact sum and the two closest IEEE 754 double representable numbers on either side:

0.5000000000000000277555756156289135105907917022705078125
0.5000000000000000000000000000000000000000000000000000000
0.5000000000000001110223024625156540423631668090820312500

It's clear that the exact 0.5 is closest to the true sum. IEEE 754 has rules regarding simple math operations that dictate how result rounding will take place, and you can generally rely on the closest result to be taken.

3
  • 1
    I also don't think this is a terribly good canonical answer to the question asked in the title. It makes no reference to the decimal representations of the floating-point numbers closest to 0.1 and 0.4. Jan 22, 2018 at 10:47
  • @HighPerformanceMark I didn't intend for it to be canonical, if you can do a better job of explaining it then please leave an answer! Jan 22, 2018 at 14:18
  • 2
    Thanks for the invitation to stick my head between the jaws of this particular lion. I'm not sure either SO or the Internet at large needs yet another explanation of how floating-point arithmetic is different from the decimal arithmetic most of us learnt in school. Even less sure that I am the person to write it. Jan 22, 2018 at 14:38

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.