**Q:**
When I set a `float` variable to, say, 3.1,
why is `printf` printing it as 3.0999999?

**A:**
Most computers use base 2
for floating-point numbers
as well as for integers,
and
just
as for base 10,
not all fractions are representable exactly in base 2.
It's well-known that
in base 10,
a fraction like
1/3 = 0.333333...
repeats
infinitely.
It turns out that
in
base 2,
one tenth
is
also
an infinitely-repeating fraction
(0.0001100110011...),
so
exact decimal fractions
such as 3.1
cannot be represented exactly in binary.
Depending on how carefully
your compiler's binary/decimal conversion routines
(such as those used by `printf`)
have been written,
you may see discrepancies when
numbers
not exactly representable in base 2
are assigned or read in and then printed
(i.e. converted from base 10 to base 2 and back again).
[footnote]
See also question 14.6.

**Q:**
I'm trying to take some
square roots,
and I've simplified the code down to

main() { printf("%f\n", sqrt(144.)); }but I'm still getting crazy numbers.

**A:**
Make sure that you have #included `<math.h>`,
and
correctly declared other functions returning `double`.
(Another
library function
to be careful with
is `atof`,
which is declared in `<stdlib.h>`.)
See also questions
1.25,
14.3,
and
14.4a.

References: CT&P Sec. 4.5 pp. 65-6

**Q:**
I'm trying to do some simple
trig,
and I am #including `<math.h>`,
but
the linker keeps
complaining
that functions like `sin`
and `cos`
are undefined.

**A:**
Make sure you're actually linking with the math library.
For instance,
due to a longstanding bug in Unix and Linux systems,
you usually need to use an explicit `-lm`
flag,
at the
*end*
of the command line,
when compiling/linking.
See also questions
13.25,
13.26,
and
14.2.

**Q:**
My floating-point calculations are acting strangely and giving me
different answers on different machines.

**A:**
First,
see question 14.2.

If the problem isn't that simple, recall that digital computers usually use floating-point formats which provide a close but by no means exact simulation of real number arithmetic. Among other things, the associative and distributive laws do not hold completely; that is, order of operation may be important, and repeated addition is not necessarily equivalent to multiplication. Underflow, cumulative precision loss, and other anomalies are often troublesome.

Don't assume that floating-point results will be exact,
and especially don't assume that floating-point values can be
compared for equality.
(Don't throw haphazard ``fuzz factors'' in, either;
see question 14.5.)
Beware that some machines have more precision available
in
floating-point computation registers
than in `double` values stored in memory,
which can lead
to floating-point inequalities
when it would seem that two values just
*have*
to be equal.

These problems are no worse for C than they are for any other computer language. Certain aspects of floating-point are usually defined as ``however the processor does them'' (see also questions 11.33 and 11.34), otherwise a compiler for a machine without the ``right'' model would have to do prohibitively expensive emulations.

This document cannot begin to list the pitfalls associated with, and workarounds appropriate for, floating-point work. A good numerical programming text should cover the basics; see also the references below. (Beware, though, that subtle problems can occupy numerical analysts for years.)

References:
Kernighan and Plauger,
*The Elements of Programming Style* Sec. 6 pp. 115-8

Knuth, Volume 2 chapter 4

David Goldberg,
``What Every Computer Scientist Should Know
about Floating-Point Arithmetic''

**Q:**
I'm sure I've got the
trig functions
declared correctly,
but they're still giving me wrong answers.

**A:**
You weren't handing them angles in degrees, were you?
C's trig functions (like FORTRAN's and most other languages)
accept angles in radians.
The conversion from degrees to radians is simple enough:

sin(degrees * pi / 180)

**Q:**
What's a good way to check for ``close enough''
floating-point equality?

**A:**
Since the absolute accuracy of floating point values varies,
by definition,
with their magnitude,
the
best way
of comparing two floating point values
is to use an accuracy threshold
which is relative
to the magnitude of the numbers being compared.
Rather than

double a, b; ... if(a == b) /* WRONG */use something like

#include <math.h> if(fabs(a - b) <= epsilon * fabs(a))where

A decidedly inferior approach, not generally recommended, would be to use an absolute threshold:

if(fabs(a - b) < 0.001) /* POOR */Absolute ``fuzz factors'' like 0.001 never seem to work for very long, however. As the numbers being compared change, it's likely that two small numbers that should be taken as different happen to be within 0.001 of each other, or that two large numbers, which should have been treated as equal, differ by more than 0.001 . (And, of course, the problems merely shift around, and do not go away, when the fuzz factor is tweaked to 0.005, or 0.0001, or any other absolute number.)

Doug Gwyn suggests using the following ``relative difference'' function. It returns the relative difference of two real numbers: 0.0 if they are exactly the same, otherwise the ratio of the difference to the larger of the two.

#define Abs(x) ((x) < 0 ? -(x) : (x)) #define Max(a, b) ((a) > (b) ? (a) : (b)) double RelDif(double a, double b) { double c = Abs(a); double d = Abs(b); d = Max(c, d); return d == 0.0 ? 0.0 : Abs(a - b) / d; }Typical usage is

if(RelDif(a, b) <= TOLERANCE) ...

References: Knuth Sec. 4.2.2 pp. 217-8

**Q:**
How do I round numbers?

**A:**
The simplest
and most straightforward
way is with code like

(int)(x + 0.5)C's floating to integer conversion truncates (discards) the fractional part, so adding 0.5 before truncating arranges that fractions >= 0.5 will be rounded up. (This technique won't work properly for negative numbers, though, for which you could use something like

You can round to a certain precision by scaling:

(int)(x / precision + 0.5) * precisionHandling negative numbers, or implementing even/odd rounding, is slightly trickier.

Note that because truncation is otherwise the default, it's usually a good idea to use an explicit rounding step when converting floating-point numbers to integers. Unless you're careful, it's quite possible for a number which you thought was 8.0 to be represented internally as 7.999999 and to be truncated to 7.

Additional links: further reading

**Q:**
Why doesn't C have an
exponentiation operator?

**A:**
One reason is
probably
that
few processors have
a built-in
exponentiation instruction.
C has a
`pow` function
(declared in `<math.h>`)
for performing exponentiation,
although explicit multiplication is usually better
for small positive integral exponents.
[footnote]
In other words, `pow(x, 2.)`
is probably inferior
to
`x * x`.
(If you're tempted to make
a `Square()` macro,
though,
check question 10.1 first.)

References:
ISO Sec. 7.5.5.1

H&S Sec. 17.6 p. 393

**Q:**
The predefined constant `M_PI`
seems to be missing from
my machine's copy of
`<math.h>`.

**A:**
That constant
(which is apparently supposed to be the value of pi,
accurate to the machine's precision),
is not
standard;
in fact a standard-conforming copy of `<math.h>`
should
*not* `#define` a symbol `M_PI`.
[footnote]
If you need pi, you'll have to define it yourself,
or compute it with `4*atan(1.0)`
or
`acos(-1.0)`.
(You could use
a
construction
like

#ifndef M_PI #define M_PI 3.1415926535897932385 #endifto provide your own #definition only if some system header file has not.)

References: PCS Sec. 13 p. 237

**Q:**
How do I
set variables to, or
test for
IEEE NaN
(``Not a Number'')
and other special values?

**A:**
Many systems
with high-quality IEEE floating-point implementations provide
facilities
(e.g.
predefined constants,
and functions
like
`isnan()`,
either
as nonstandard extensions
in `<math.h>`
or perhaps
in
`<ieee.h>`
or
`<nan.h>`)
to deal with these values cleanly,
and
work is being done
to formally standardize such facilities.
A crude but usually effective test for NaN
can be written based on the fact that IEEE NaN's
never compare equal to anything,
even themselves;
therefore
a number that doesn't compare equal to itself
must be a NaN:

#define isnan(x) ((x) != (x))Beware, though, that non-IEEE-aware compilers may optimize the test away. (Note also that even if you do have a predefined constant like

C99 provides
`isnan()`,
`fpclassify()`,
and several other classification routines.

Another
possibility
is
to format
the value in question
using
`sprintf`:
on
many systems
it
generates
strings like `"NaN"` and `"Inf"`
which you could compare for
in a pinch.

To initialize variables with these values (and if your system does not provide cleaner solutions), you may be able to get away with some compile-time ``arithmetic'':

double nan = 0./0.; double inf = 1./0.;Don't be too surprised, though, if these don't work (or if they abort the compiler with a floating-point exception).

(The most reliable way of setting up these special values would use a hex representation of their internal bit patterns, but initializing a floating-point value with a bit pattern would require using a union or some other type punning mechanism and would obviously be machine-dependent.)

See also question 19.39.

References: C9X Sec. 7.7.3

**Q:**
How can I
handle
floating-point exceptions gracefully?

**A:**
See question 19.39.

**Q:**
What's a good way to implement complex numbers in C?

**A:**
It is straightforward to define a simple structure
and some arithmetic functions to manipulate them.
C99 supports `complex` as a standard type.
[footnote]
Here is a
tiny
example,
to give you a feel for it:

typedef struct { double real; double imag; } complex; #define Real(c) (c).real #define Imag(c) (c).imag complex cpx_make(double real, double imag) { complex ret; ret.real = real; ret.imag = imag; return ret; } complex cpx_add(complex a, complex b) { return cpx_make(Real(a) + Real(b), Imag(a) + Imag(b)); }You can use these routines with code like

complex a = cpx_make(1, 2); complex b = cpx_make(3, 4); complex c = cpx_add(a, b);or, even more simply,

complex c = cpx_add(cpx_make(1, 2), cpx_make(3, 4));

See also questions 2.7, 2.10, and 14.12.

References: C9X Sec. 6.1.2.5, Sec. 7.8

**Q:**
I'm looking for some
code to do:

Fast Fourier Transforms (FFT's)

matrix arithmetic (multiplication, inversion, etc.)

complex arithmetic

**A:**
Ajay Shah
has prepared a nice
index of free numerical
software
which has been archived pretty widely;
one URL
is
`ftp://ftp.math.psu.edu/pub/FAQ/numcomp-free-c`
.
See also
questions 18.9b, 18.13, 18.15c, and 18.16.

**Q:**
I'm having trouble with a Turbo C program which crashes and says
something like
``floating point formats not linked.''

**A:**
Some compilers for small machines, including
Turbo C
(and Ritchie's
original PDP-11 compiler),
leave out
certain
floating point support if it looks like it will not be needed.
In particular, the non-floating-point versions
of `printf`
and `scanf`
save space by not including
code to handle `%e`, `%f`,
and `%g`.
It happens that
Borland's
heuristics for determining
whether the program uses floating point
are insufficient,
and the
programmer must
sometimes
insert
a dummy
call
to a floating-point library function
(such as `sqrt`; any will
do)
to force loading of
floating-point support.
(See the comp.os.msdos.programmer FAQ list for more
information.)

A partially-related problem, resulting in a similar error message (perhaps ``floating point not loaded'') can apparently occur under some MS-DOS compilers when an incorrect variant of the floating-point library is linked. Check your compiler manual's description of the various floating-point libraries.

Additional links: another possibility

Read sequentially: prev next up