Notes on data types & floating-point arithmetic in C

Common data types in C

Data Typeas in stdint.hSuffix for literals printf format printf format
as in inttypes.h
Range
as in limits.h
or float.h
doubleNo suffix %e, %E, %f, %g, %G DBL_MAX, DBL_MIN, DBL_EPSILON
floatf or F %e, %E, %f, %g, %G FLT_MAX, FLT_MIN, FLT_EPSILON
long double
(could be either
80 or 128 bits)
l or L %Le, %LE, %Lf, %Lg, %LG LDBL_MAX, LDBL_MIN, LDBL_EPSILON
__float128
intint32_tNo suffix %d PRId32 INT_MAX, INT_MIN
unsigned intuint32_tu or U %u PRIu32 UINT_MAX, UINT_MIN
longint64_tl or L %ld PRId64 LONG_MAX, LONG_MIN
unsigned longuint64_tul or UL %lu PRIu64 ULONG_MAX, ULONG_MIN
long long intint64_tll or LL %lld PRId64 LLONG_MAX, LLONG_MIN
unsigned long long intuint64_tull or ULL %llu PRIu64 ULLONG_MAX, ULLONG_MIN
__int128_t
__uint128_t
intmax_t
uintmax_t
intmax_t
uintmax_t
%jd
%ju
PRIdMAX
PRIuMAX
INTMAX_MAX, INTMAX_MIN, UINTMAX_MAX
ssize_t
size_t
(ssize_t is defined
in unistd.h
and size_t in stddef.h)
%zd
%zu
SIZE_MAX
intptr_t
uintptr_t
ptrdiff_t
(ptrdiff_t is defined in stddef.h and is for result
of pointer subtraction)
%p __PRIPTR_PREFIX PTRDIFF_MAX, PTRDIFF_MIN

LP64 vs LLP64

I = integer, L = long, LL = long long, P = pointer

Data TypeLP64
(64-bit Linux)
ILP64LLP64
(64-bit Windows)
ILP32LP32
char88888
short1616161616
int3264323216
long6464323232
long long646464
pointer6464643232

See here and here for details.

Floating-point handling in optimizing compilers

Compilers have fine-grained control over optimizations they perform on numerical computation code. Here are some common compiler command-line options:

OptimizationGCCIntelPGIPathScale
Open64
Sun Studio
Denormals Are Zero (DAZ)
Flush To Zero (FTZ)
Enabled if either of the following are used:

-ffast-math
-funsafe-math-optimizations

Enabled at the default level of optimization.

If -O0 is used, use -ftz to enable.

-Mdaz and -Mflushz Enabled if either of the following are used or implied:

-OPT:IEEE_arith=2
-OPT:IEEE_arith=3

-fns
Legacy x87 precision control.

The significand is rounded to the specified bits. Default is 80 (64-bit significand)

-mpc32
-mpc64
-mpc80
-pc32
-pc64
-pc80
-pc 32
-pc 64
-pc 80
-mx87-precision=32
-mx87-precision=64
-mx87-precision=80
-fprecision=single
-fprecision=double
-fprecision=extended