Floating Point Arithmetic in C++

Learn C++ free with free4dev
Learn C++ free with free4dev
This entry is part 4 of 6 in the series Learn C++

Floating Point Numbers are Weird

The first mistake that nearly every single programmer makes is presuming that this code will work as intended:

The novice programmer assumes that this will sum up every single number in the range 0, 0.01, 0.02, 0.03, ..., 1.97, 1.98, 1.99, to yield the result 199 – the mathematically correct answer.

Two things happen that make this untrue:

  1. The program as written never concludes. a never becomes equal to 2, and the loop never terminates.
  2. If we rewrite the loop logic to check a < 2 instead, the loop terminates, but the total ends up being something different from 199. On IEEE754-compliant machines, it will often sum up to about 201 instead.

The reason that this happens is that Floating Point Numbers represent Approximations of their assigned values.

The classical example is the following computation:

Though what we the programmer see is three numbers written in base10, what the compiler (and the underlying hardware) see are binary numbers. Because 0.1, 0.2, and 0.3 require perfect division by 10 – which is quite easy in a base-10 system, but impossible in a base-2 system – these numbers have to be stored in imprecise formats, similar to how the number 1/3 has to be stored in the imprecise form 0.333333333333333... in base-10.

You may also like:

Series NavigationPrevious: Operator precedence in C++Next: Bit Operators in C++


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.