Intel complier suite reference card

Intel Compiler Suite

icc
ecc
C compiler driver.

ecc is the old name of icc for Itanium architecture. (e stands for Electron, the code name of Intel's C compiler for Itanium)

icpcC++ compiler driver.
ifort
efc
Fortran compiler driver.

efc is the old name of ifort for Itanium architecture.

mcpcomC/C++ compiler (Macro CPlus COMpiler ?)
fortcomFortran compiler
svcpcomC/C++ source verifier/checker
svfortcomFortran source verifier/checker
fppFortran preprocessor
idbDebugger
codecovCode-coverage and test-priorization tool
xiarLibrary archiver. It has the same functionality of ar in the GNU Binutils, but is for Interprocedural Optimizations here.
xildLinker. It has the same functionality of ld in the GNU Binutils, but is for Interprocedural Optimizations here.

File extensions

.cC source files.
.hC/C++ header files.
.iPreprocessed C source files.
.C/cc/cxx/c++/cp/cpp/CPPC++ source files.
.H/hh/hxx/h++/hp/hpp/HPP/tccC++ header files.
.sAssembler code.
.dDependency files. They contain rules suitable for Makefile describing the dependencies of the source file.

Created by -MD option.

Now the compiler...

The default compiler options can be placed in the icc.cfg file (for C compiler), icpc.cfg file (for C++ compiler), or ifort.cfg file (for Fortran compiler). These files can also be pointed to by ICCCFG, ICPCCFG, and IFORTCFG environmental variables.

Compile

-cCompile *.c and assemble *.s. NO linking.
-IdirAlso search dir for header files.

This can also be controlled by environmental variables C_INCLUDE_PATH and CPLUS_INCLUDE_PATH.

-SCompile *.c into assembly codes *.s. NO linking.
-fsource-asmMake the generated assembly codes more readable.
-ERun preprocessor only. The output is sent to stdout.
-CWhen running preprocessor, don't discard comments in the program.
-dMWhen used with -E option, display definitions of all built-in macros, e.g.

   gcc -E -dM - < /dev/null

-o filePlace output in file
-dryrunDisplay the programs invoked by the driver, but do not compile.
--version
-dumpversion
Print the version number.
-dumpmachinePrint the machine info.
@fileRead command-line options from file. The options read are inserted in place of the original @file option.
-BprefixSearch prefix for Intel compiler executables.

The equivalent environmental variable is GCC_EXEC_PREFIX.

-print-multi-libPrint the search directories of system libraries

C/C++ dialect

-ansi
-strict-ansi
Strictly ISO C90 standard. In particular, C programs can't use C++ style "//" comments and inline keyword.

__INTEL_STRICT_ANSI__ will be defined.

-std=sDetermine the language standard. s can be c89, c99, gnu89, gnu++98 (default), c++0x ...
-openmpEnable OpenMP.

The run-time library libiomp will be used.

_OPENMP will be defined.

-openmp-link static
-openmp-link dynamic
Whether OpenMP library should be statically or dynamically linked.
-openmp-stubsCompile an OpenMP program into serial code. All OpenMP directives are ignored.
-openmp-task modelWhich OpenMP tasking model to be used. model can be intel or omp (default).
-openmp-report nShow diagnostic information when compiling OpenMP programs. n specifies detail levels.
-openmp-threadprivate
type
Specify the type of threadprivate implementation. type can be legacy (default) or compat (so it will be compatible with other compilers).
-par-num-threads=n The number of threads to use. This option will override the environmental variable OMP_NUM_THREADS

This option must be used together with either -parallel or -openmp.

-par-affinity Set the threads' CPU affinity. This option will override the environmental variable KMP_AFFINITY

This option must be used together with either -parallel or -openmp.

-pthreadEnable pthread support.
-mklEnable Intel Math Kernel library.
-ippEnable Intel Performance Primitives library.
-use-intel-optimized-headersUse headers of Intel Performance Primitives library.
-tbbEnable Intel Threading Building Blocks library.
-cilk-serializeEnable serialization of Intel Cilk Plus code.
-funsigned-char
-fsigned-char
Whether by default char is signed or unsigned.

Fortran specific

-heap-arrays 0Put automatic arrays on heap instead of stack. If your Fortran code crashes because of large arrays, this could help.

Interoperability with GCC

-gcc-name=dir
-cxxlib[=dir]
When used together, specify the full path of gcc C++ libraries.

-cxxlib is on by default

Alternatively, one can set environmental variables GXX_ROOT and GXX_INCLUDE

-gcc-version=nnnCompatibility with gcc version nnn, e.g. 340 means gcc 3.4
-gxx-name=dirSpecify the full path of g++

Preprocessor

-Dname
-Dname=value
Predefine the macro name, with value 1, or with the specified value
-UnameUn-define the (built-in or -D defined) macro name
-M
-MM
Output a rule (to stdout) suitable for Makefile describing the dependencies of the source file.

-MM only outputs header files not in the system header directories.

This option implies -E option.

-MF fileThe output of -M is written to file. This can also be controlled by environmental variable DEPENDENCIES_OUTPUT.
-MDThe same as -M -MF combined, but doesn't imply -E.

*.d files will be generated.

-Wp,optPass opt to the C preprocessor.

Warning messages

-WallEnable all warnings.
-wSuppress all warnings.
-WerrorTreat warnings as errors.
-WinlineWarn if a function can't be inlined by compiler but is declared as such in the program.

Link

-LdirAlso search dir for library files. This can also be controlled by environmental variable LIBRARY_PATH.
-llibraryLink to liblibrary

The linker searches libraries and object files in the order they are specified, so

    foo.o -lz bar.o

will search library z after file foo.o but before bar.o, so if bar.o refers to functions in z, then -lz must appear AFTER bar.o

-sRemove all symbol information from the executable

-staticProduce statically linked executable

-shared
-fPIC
-rdynamic
Produce shared libraries. For details, see here.

-rdynamic is needed for some uses of dlopen or to allow obtaining backtraces from within a program.

-nostartfilesDon't link to the standard startup files (so the start point of a program is not main, but _start).

To compile crt1.o, one has to use this option.

Also see here for examples.

-nodefaultlibsDon't link to the standard system libraries (e.g. libgcc.a).
-nostdlibDon't link to the standard system libraries (e.g. libgcc.a) or startup files.
-static-libgcc
-shared-libgcc
Whether libgcc should be statically or dynamically linked.
-static-intel
-shared-intel
Whether Intel-provided libraries should be statically or dynamically linked.
-Wl,optPass opt to the linker.
-Wl,-t
-Wl,--print-map
Enable linker to output trace/link map information.
-Wl,--start-group
-Wl,--end-group
All the options between this pair are passed to the linker.

Debugging

-gProduce debugging information.
-trapuvInitializes stack local variables to an unusual value to aid error detection.
-debug allProduce complete debugging information.
-debug parallelProduce debugging information for the thread data sharing and reentrant call detection of the Intel Parallel Debugger Extension.
-diag-enable sc-parallelEnables analysis of parallelization in source code (parallel lint diagnostics).
-diag-enable groupEnables messages of diagnostic group, which can be vec, thread, par, openmp, driver, ...
-save-tempsSave all temporary/intermediate files produced during compiling.
-parallel-source-infoEmit source code location when OpenMP or auto-parallelization code is generated.

Profiling

-openmp-profileProduce profiling information for OpenMP. A text file named guide.gvs ("Generated Values and Statistics". The file name can be specified through KMP_STATSFILE environmental variable) will be created after the run.

See here for the explanation of guide.gvs.

-prof-gen=srcposUse together with codecov for code coverage analysis.

Optimization

-O0
-Od
Don't optimize.
-O1Optimize.

When any optimization option is used, __OPTIMIZE__ is defined.

-O
-O2
Optimize even more.

This is default.

-O3Optimize yet more.

In particular, it will try to inline a function whenever possible.

-OsOptimize for code size.

This enables all -O2 optimizations that don't increase code size.

-fastThe same as -ipo -O3 -no-prec-div -static -xHost

Note the -static part.

-opt-report
-opt-report-level n
Show diagnostic information about optimization during compilation.

n specifies detail levels, default is 1.

-fp-model modelIf model is fast=1 or fast=2 then optimize floating-point arithmetic aggressively at cost in accuracy or consistency.

Other possible values for model are precise, except, strict, source (round intermediate results to source-defined precision), double (round intermediate results to double precision), and extended (round intermediate results to extended precision).

-pcnRound the significand to n bits, n can be 32, 64, 80.
-fp-speculation=modeEnable compiler to speculate on floating-point operations.

mode can be fast (default mode if any optimization is on), safe, strict, or off.

-fp-portRound floating-point results after floating-point operations.
-fp-stack-check Generate extra code after every function call to ensure that the FPU register stack is in the expected state.
-fp-relaxed Generate fast but less accurate code sequences for math functions (Itanium only).
-ftz(Enabled automatically by -O3) Enable the DAZ (Denormals Are Zero) and FTZ/FZ (Flush To Zero) bits in the x87 FPU control word. The effect is denormal results from floating-point calculations will be set to 0, and denormal values used as input to floating-point instruction will be treated as 0. See here for details.
-fpen (Fortran) Set n to 0 to enable floating-point invalid, divide-by-0, overflow exceptions.

Set n to 3 (default) to disable all floating-point exceptions.

-fast-transcendentals Generate fast but less accurate code for transcendental functions.
-prec-div
-no-prec-div
(Fortran) Whether or not to generate slow but more accurate code for floating-point divide.
-prec-sqrt
-no-prec-sqrt
(Fortran) Whether or not to generate slow but more accurate code for floating-point square root.
-unrollnUnroll the loop at most n times.

To disable, use n=0.

-prof-gen
-prof-use
-prof-dir
Profile guided optimization (PGO).
-ipoLink time/Inter-procedural optimization.
-parallel
-par-threshold n
Automatically paralellize loops.

n sets the maximum number of threads, and the default is 75.

-par-report nShow diagnostic information about automatic loop parallelization during compilation.

n specifies detail levels.

-guide nProvide guidance/advice for automatic vectorization/parallelization/data transformation when -parallel option is used.

n sets the verbose-ness.

-vec
-vec-threshold n
Automatically vectorize loops.

n sets the maximum number of threads, and the default (which is also the maximum) is 100.

-vec-report nShow diagnostic information about loop vectorization during compilation.

n specifies detail levels, default is 1.

-march=cpuGenerate code for specific cpu, which can be core2, pentium4, or pentium3.
-mtune=cpuTune for specific cpu, e.g. core2, pentium4, pentium, pentiumpro, pentium-mmx, itanium, itanium2, etc.

The default is pentium4 on x86_64 and itanium2 on IA64.

-xsimd
-axsimd
Generate code for specific SSE/SIMD extensions.

simd can be avx, sse4.2, sse4.1, ssse3, sse3, sse3_atom, sse2.

-ax is similar to -x, except it also generates the non-SIMD-specific code (a=automatic processor dispatch).

-minstruction=movbeGenerate movbe instruction if needed.
-opt-callocUse optimized calloc call.
-opt-sub-script-in-rangeAssume loop indices never overflow.
-opt-matmulIdentify matrix multiplication loop nests (if any) and replace them with a matmul library call for improved performance.

Interesting features

-soxRecord compiler version number and command-line options in the generated objects.

To see them, use the following commands

    objdump -sj .comment a.out
    strings -a a.out |grep comment:
    
-fstack-protector
-fstack-protector-all
Enable protection against buffer overflows such as stack smashing attacks.

Environmental variables

The following environmental variables will affect compilation:

GXX_ROOT
GXX_INCLUDE
Specify the location of the gcc binaries/headers.
IA32ROOT
IA64ROOT
Specify the location of the headers/libraries for a non-standard installation structure.
ICCCFG
ICPCCFG
IFORTCFG
Specify the compiler-default-options files.

OpenMP environmental variables

The following environmental variables will affect OpenMP programs.

KMP-prefix ones are Intel specific environmental variables (K=KAI, Kuck & Associates, Inc)

OMP-prefix ones (not shown here) are standard OpenMP environmental variables. See the previous link for the complete list.

Also see here for Intel extension routines to OpenMP.

KMP_SETTINGS [Version 12] Set to 1 to display OpenMP run-time library environmental variables.
KMP_AFFINITY Set the threads' CPU affinity.

See here for details.

KMP_ALL_THREADS Maximum number of simultaneously executing threads.
KMP_BLOCKTIME The time, in milliseconds, that a thread should wait, after completing the execution of a parallel region, before sleeping. The default is 200. Can be set to infinity.
KMP_DYNAMIC_MODE How to choose the number of threads when OMP_DYNAMIC environmental variable is set to true.

If the value is load_balance (default), then it tries to avoid using more threads than the number of available processors.

If the value is thread_limit, then it tries to avoid using more threads than the total number of processors.

If the value is asat, then it is based on parallel start time.

KMP_LIBRARY OpenMP run-time library execution mode.

If the value is throughput (default), then it is optimized for sharing resources with other programs.

If the value is turnaround, then it is for dedicated use of resources, as in HPC.

If the value is serial, then it enforces serial execution.

KMP_STACKSIZE The number of bytes (e.g. 2m, 4k) allocated for each OpenMP thread to use as the private stack for the thread.
KMP_STATSFILE The file name for OpenMP profiling option -openmp-profile

The default name is guide.gvs

KMP_CPUINFO_FILE Specify an alternate file name for file containing machine topology description

The default is /proc/cpuinfo

Math Kernel Library environmental variables

The following environmental variables will affect programs using Intel Math Kernel Library. Most of them are thread control, i.e. they will only affect programs linked to multi-threaded version of MKL.

MKL_NUM_THREADS
OMP_NUM_THREADS
Set the number of threads.
MKL_DYNAMIC Set to FALSE to disable automatic selection of number of threads.

The default is TRUE.

Note that MKL will by default ignore the logical cores created by the HyperThreading technology.

MKL_ALL
MKL_BLAS
MKL_FFT
MKL_VML
Set the number of threads for each function domain.

Alternatively, one can also use environmental variable MKL_DOMAIN_NUM_THREADS

MKL_SERIAL For older versions of MKL, set it to YES to disable multithreading.
MKL_DISABLE_FAST_MM
MKL_MM_DISABLE
Set to any value to disable fast memory management; it will cause memory to be allocated and freed from call to call, which will negatively impact performance of routines such as the level 3 BLAS functions (*gemm), especially for small problem sizes.
I_MPI_NUMBER_OF_MPI_PROCESSES_PER_NODE
I_MPI_THREAD_LEVEL
Although these are Intel-MPI environmental variable, MKL will read their values to determine the optimal number of threads to be used.
MKL_DEBUG_CPU_TYPE (Undocumented) Set to an integer between 0 and 4, inclusive, to choose the MKL math functions optimized for a specific SSE instruction set. This setting overrides MKL's Processor Dispatch.

For 64-bit, 0 to choose the default SSE, 1 for SSE 2, 2 for Supplemental SSE 3, 3 for SSE 4.2, and 4 for AVX.

For 32-bit, 0 the default SSE, 1 for SSE2, 2 for SSE 3, 3 for Supplemental SSE 3, 4 for SSE 4.2, and 5 for AVX.

This setting has substantial performance impact on computational intensive BLAS functions (e.g. *gemm) for large problem sizes. Also, if your CPU does not support the SSE instruction set you specified, you could get "illegal instruction" error.

MKL_DEBUG_CPU_MA (Undocumented) (Tested in version 10.2.5 and 10.3) Set to an integer to choose the MKL math functions optimized for a specific MicroArchitecture: 32 for Merom, 33 for Penryn, 64 for Nehalem, 66 for Westmere, 128 for Sandy Bridge, and 0 for everything else (including AMD Barcelona).

Currently this flag is only used to distinguish between Nehalem and Westmere, that is, when MKL_DEBUG_CPU_TYPE is set for SSE 4.2, and only BLAS functions *gemm use it (and only use it on very large problem size). The performance difference is not that great though (1-2%).

For all other cases, Intel MKL infers the MicroArchitecture from the SSE instruction set and ignores MKL_DEBUG_CPU_MA setting. For example, AVX instruction means the MicroArchitecture is Sandy Bridge, SSE4.1 means Penryn (both Meron and Penryn support Supplemental SSE 3), etc.

Intel C/C++ Compiler built-in macros

__cplusplus Is defined if C++ compiler is in use.
__FILE__
__BASE_FILE__
Name of the current input file (as a C string constant)

This is ANSI C standard macro.

__LINE__ Current input line number (as an integer constant)

This is ANSI C standard macro

__DATE__
__TIME__
Date & time on which the preprocessor is run. (as C string constants)

These are ANSI C standard macros.

__TIMESTAMP__ Last modification time of the input file (as a C string constant)
__STDC__
__STDC_VERSION__
Evaluate to 1 to mean the compiler is ISO standard conformant.

__STDC_VERSION__ evaluates to a C string constant of the form of the form yyyymmL.

__STDC__ is an ANSI C standard macro.

__PIC__
__pic__
Evaluate to 1 if the program is compiled with -fPIC flag, i.e. position-independent code..
__GNUC__
__GNUC_MINOR__
__GNUC_PATCHLEVEL__
Evaluate to integer constants representing the GNU (C/C++/Fortran) compiler version numbers (major/minor/patch level).
__ICC
__VERSION__
__ECC
Evaluate to integer constants representing the Intel C/C++ compiler version numbers.

__ECC is for Itanium only.

__INTEL_COMPILER_BUILD_DATE Evaluate to the Intel C compiler build date (yyyymmdd format).
_OPENMP Is defined if OpenMP is in effect.
KMP_VERSION_MAJOR
KMP_VERSION_MINOR
KMP_VERSION_BUILD
Evaluate to integer constants representing the Intel OpenMP version numbers and build date (yyyymmdd format).
__itanium__
__ia64__
Defined for Itanium.
__LP64__ Defined for Linux x86_64 and Itanium.
__x86_64__ Defined for x86_64.
__SSE__
__SSE2__
__SSE3__
__SSSE3__
Defined for processors that supports SSE/SSE2... instructions.
__OPTIMIZE__
__OPTIMIZE_SIZE__
Is defined if any optimization flag is used.

Furthermore, __OPTIMIZE_SIZE__ is defined if the optimization is for size, not speed.

__GNUG__ Evaluate to minor version number of GNU C++ compiler.

Intel C/C++ Compiler #pragma directives

See here for a list of Intel specific pragmas.

#pragma prefetch var1,var2... Prefetch data in variables var1,var2...
#pragma loop_count n1,n2... Specify the possible numbers n1,n2... of iterations for the loop.
#pragma unroll n Specify the how many times a loop to be unrolled.
#pragma unroll_and_jam n Specify the how to unrolls one or more loops higher in the nest than the innermost loop and fuses/jams the resulting loops back together.
#pragma swp Specify the loop to be software-pipelined.
#pragma poison symbol1 symbol2 ... symbol1 symbol2 .. are (unquoted string) identifiers which will be removed during compilation.
#pragma message string Display string (C string constant) during compilation.
#pragma weak symbol Declare symbol as a weak symbol.

A better way to achieve this is through function attribute "weak".

#pragma weak symbol1=symbol2 Declare symbol1 as a weak alias of symbol2.

A better way to achieve this is through function attributes "weak, alias".

Intel supplied libraries

i*libFNPFlexNet Publisher License server manager for Intel C, C++, and Fortran compilers
libcilkrtsIntel Cilk Plus run-time library
libclompc
libclusterguide
Intel Cluster OpenMP run-time library
libcxaguardFor guarded initiailization of static variables in C++ code. For details, see the comments in libstdc++-v3/libsupc++/guard.cc in GCC source tree and here
libdecimalIEEE 754-2008 Decimal floating-point arithmetic.
libguideIntel's legacy OpenMP run-time library. It also implements Intel's extensions (i.e. KMP_-prefix environmental variables)

Why it's called "Guide" ? Because the OpenMP-enabled KAI compiler is called Guide Compiler (e.g. guidec)

libguide_statslibguide for the parallelizer tool with performance statistics and profile information
libimfIntel's equivalent of libm.a
libiomp5
libiompprof5
libiompstubs5
libompstub
Intel's new OpenMP run-time library. Programs compiled with non-Intel compilers can be linked to it.

The stub libraries are used when -openmp-stubs option is used (compile an OpenMP program into serial code)

libipgoSupport library for profile-guided optimization.
libircIntel optimized run-time library. It contains the infamous Processor Dispatch code.
libirc_sLike above, but contains SSE-specific code.
libirmlIntel Resource Management Layer library. It is a work dispatcher used by Threading Building Blocks (TBB).
libpdbxParallel debugger extension runtime library.
libsvmlShort vector math library.
libifcore
libifcoremt
libifcore_pic
libifcoremt_pic
Fortran run-time library.

The mt version are for multi-threaded programs.

The _pic version allows creation of shared libraries linked to static version of libifcore instead of the dynamic one.

libifportPortability & POSIX support (Fortran).
libmkl_blacs_intelmpi20_*BLACS routines using Intel MPI 2.x
libmkl_blacs_intelmpi_*BLACS routines using Intel MPI 1.x or MPICH1/2
libmkl_blacs_*lp64BLACS routines
libmkl_blacs_openmpi_*BLACS routines using Open MPI
libmkl_blacs_sgimpt_*BLACS routines using SGI Message Passing Toolkit
libmkl_cdft_coreCluster discrete Fourier tansform routines
libmkl_coreMath Kernel Library core functions
libmkl_gf
libmkl_gf_*lp
MKL interface library for GNU Fortran compiler
libmkl_gnu_threadMKL interface library for GNU OpenMP
libmkl_intel
libmkl_intel_*lp
MKL interface library for Intel compiler
libmkl_intel_sp2dpMKL interface library supporting Cray-style naming in user programs targeted for the Intel 64 or IA-64 architecture and using the ILP64 convention. SP2DP interface provides a mapping between single-precision names (for both real and complex types) in the user program and double-precision names in Intel MKL BLAS and LAPACK. Function names are mapped as shown in the following example for BLAS functions *gemm
sgemm -> dgemm
dgemm -> dgemm
cgemm -> zgemm
zgemm -> zgemm
libmkl_intel_threadMKL interface library for Intel OpenMP
libmkl_lapackLAPACK routines
libmkl_defDefault, untuned MKL, for SSE capable processors.
libmkl_p4nMKL for SSE 2 capable processors , i.e. Pentium 4 or better.
libmkl_mcMKL for Supplemental SSE 3 capable processors, i.e Core or better.
libmkl_mc3MKL for SSE4.2 capable processors, i.e Nehalem or better.
libmkl_avxMKL for AVX capable processors, i.e. Sandy Bridge or better.
libmkl_pgi_threadMKL interface library for PGI OpenMP
libmkl_scalapackScaLAPACK routines
libmkl_sequentialSequential version of MKL
libmkl_solverIterative sparse solver, trust region solver, and GNU Multiple Precision (GMP) Arithmetic Library routines
libmkl_vml_*Vector math library (*=avx is for AVX, mc3/p4m3 for SSE 4.2, mc2/p4m2 for SSE 4.1, mc/p4m for Supplemental SSE3, p4p for SSE3, p4/p4n for Pentium 4)
libmpi_mt
libmpi_dbg_mt
Thread-safe MPI library (dbg=with extra debugging information)
libmpigc3
libmpigc4
MPI interface library for GCC 3.x/4.x compilers
libmpigfMPI interface library for GNU Fortran compiler
libmpi_lustre
libmpi_panfs
libmpi_pvfs2
MPI IO library tuned for Lustre, Panasas, and PVFS parallel file systems
libtmi
libtmip_mx
libtmip_psm
Tag Matching Interface support for Qlogic PSM and Myricom MX interconnects
libtvmpiMPI debugging interface library for TotalView debugger