Why do some numbers lose accuracy when stored as floating point numbers?
For example, the decimal number 9.2
can be expressed exactly as a ratio of two decimal integers (92/10
), both of which can be expressed exactly in binary (0b1011100/0b1010
). However, the same ratio stored as a floating point number is never exactly equal to 9.2
:
32bit "single precision" float: 9.19999980926513671875
64bit "double precision" float: 9.199999999999999289457264239899814128875732421875
How can such an apparently simple number be “too big” to express in 64 bits of memory?
3
This isn’t a full answer (mhlester already covered a lot of good ground I won’t duplicate), but I would like to stress how much the representation of a number depends on the base you are working in.
Consider the fraction 2/3
In goodol’ base 10, we typically write it out as something like
 0.666…
 0.666
 0.667
When we look at those representations, we tend to associate each of them with the fraction 2/3, even though only the first representation is mathematically equal to the fraction. The second and third representations/approximations have an error on the order of 0.001, which is actually much worse than the error between 9.2 and 9.1999999999999993. In fact, the second representation isn’t even rounded correctly! Nevertheless, we don’t have a problem with 0.666 as an approximation of the number 2/3, so we shouldn’t really have a problem with how 9.2 is approximated in most programs. (Yes, in some programs it matters.)
Number bases
So here’s where number bases are crucial. If we were trying to represent 2/3 in base 3, then
(2/3)_{10} = 0.2_{3}
In other words, we have an exact, finite representation for the same number by switching bases! The takeaway is that even though you can convert any number to any base, all rational numbers have exact finite representations in some bases but not in others.
To drive this point home, let’s look at 1/2. It might surprise you that even though this perfectly simple number has an exact representation in base 10 and 2, it requires a repeating representation in base 3.
(1/2)_{10} = 0.5_{10} = 0.1_{2} = 0.1111…_{3}
Why are floating point numbers inaccurate?
Because oftentimes, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.
7
 9
So in other words, base3 would be perfect for
1/3
just as base10 is perfect for1/10
. Neither fraction works in base2– mhlesterFeb 20, 2014 at 1:19
 3
@mhlester Yes. And in general, baseN is perfect for any fraction whose denominator is
N
or a multiple thereof.Feb 20, 2014 at 1:20
 4
And this is one reason why some numerical tool boxes keep track of “what was divided by what”, and in the process can keep “infinite accuracy” for all rational numbers. Just like physicists like to keep their equations symbolic until the last possible moment, in case factors of
π
etc cancel out.– FlorisFeb 20, 2014 at 1:39
 3
@Floris I’ve also seen cases where an algorithm that only performs basic arithmetic (ie, preserves rationality of input), determine if the input was (likely) rational, perform the math using normal floating point arithmetic, then reestimate a rational approximation at the end to fix any rounding errors. In particular Matlab’s reduced row echelon form algorithm does this, and it help numerical stability tremendously.
Feb 20, 2014 at 1:54
@SchighSchagh – interesting, I didn’t know that. I do know that numerical stability is something that is not taught sufficiently in these days of double double precision. Which means that many miss learning about the elegance of many beautiful algorithms. I really like algorithms that compute and correct their own errors.
– FlorisFeb 20, 2014 at 2:04
While all of the other answers are good there is still one thing missing:
It is impossible to represent irrational numbers (e.g. π, sqrt(2)
, log(3)
, etc.) precisely!
And that actually is why they are called irrational. No amount of bit storage in the world would be enough to hold even one of them. Only symbolic arithmetic is able to preserve their precision.
Although if you would limit your math needs to rational numbers only the problem of precision becomes manageable. You would need to store a pair of (possibly very big) integers a
and b
to hold the number represented by the fraction a/b
. All your arithmetic would have to be done on fractions just like in highschool math (e.g. a/b * c/d = ac/bd
).
But of course you would still run into the same kind of trouble when pi
, sqrt
, log
, sin
, etc. are involved.
TL;DR
For hardware accelerated arithmetic only a limited amount of rational numbers can be represented. Every notrepresentable number is approximated. Some numbers (i.e. irrational) can never be represented no matter the system.
4
 5
 6
irrational numbers can be (only) represented in their base. For example pi is 10 in base pi
– phuclvJun 24, 2014 at 14:21
 6
Point remains valid: Some numbers can never be represented no matter the system. You don’t gain anything by changing your base because then some other numbers can not be represented anymore.
Jun 27, 2014 at 4:10
 1
All constructible real numbers* can be represented exactly given an appropriate base; the choice of base is in fact infinite for any particular number. Eg, pi is 10 in basepi, and it is 100 in basesqrt(pi). In general, x is 10 in basex, and it is 100 in basex^(1/2), 1000 in basex^(1/3), etc. *Nonconstructible reals, if you allow for them via your choice of axioms, uhhh yeah shit gets real weird and nobody cares about digits anymore anyway. Regardless of all this, these esoteric bases are not really useful; and there are always irrational numbers regardless of your choice of base.
Aug 5, 2021 at 21:08
Discussion of this post on Meta
Jun 12, 2014 at 18:36
Refer to is floating math broken
May 23, 2017 at 9:12
medium.com/swlh/…
May 22, 2020 at 19:42
