Intel Compiler Suite
icc ecc | C compiler driver. ecc is the old name of icc for Itanium architecture. (e stands for Electron, the code name of Intel's C compiler for Itanium) |
icpc | C++ compiler driver. |
ifort efc | Fortran compiler driver. efc is the old name of ifort for Itanium architecture. |
mcpcom | C/C++ compiler (Macro CPlus COMpiler ?) |
fortcom | Fortran compiler |
svcpcom | C/C++ source verifier/checker |
svfortcom | Fortran source verifier/checker |
fpp | Fortran preprocessor |
idb | Debugger |
codecov | Code-coverage and test-priorization tool |
xiar | Library archiver. It has the same functionality of ar in the GNU Binutils, but is for Interprocedural Optimizations here. |
xild | Linker. It has the same functionality of ld in the GNU Binutils, but is for Interprocedural Optimizations here. |
File extensions
.c | C source files. |
.h | C/C++ header files. |
.i | Preprocessed C source files. |
.C/cc/cxx/c++/cp/cpp/CPP | C++ source files. |
.H/hh/hxx/h++/hp/hpp/HPP/tcc | C++ header files. |
.s | Assembler code. |
.d | Dependency files. They contain rules suitable for Makefile describing the dependencies of the source file.
Created by -MD option. |
Now the compiler...
The default compiler options can be placed in the icc.cfg file (for C compiler), icpc.cfg file (for C++ compiler), or ifort.cfg file (for Fortran compiler). These files can also be pointed to by ICCCFG, ICPCCFG, and IFORTCFG environmental variables.
Compile
-c | Compile *.c and assemble *.s. NO linking. |
-Idir | Also search dir for header files. This can also be controlled by environmental variables C_INCLUDE_PATH and CPLUS_INCLUDE_PATH. |
-S | Compile *.c into assembly codes *.s. NO linking. |
-fsource-asm | Make the generated assembly codes more readable. |
-E | Run preprocessor only. The output is sent to stdout. |
-C | When running preprocessor, don't discard comments in the program. |
-dM | When used with -E option, display definitions of all built-in macros, e.g.
gcc -E -dM - < /dev/null |
-o file | Place output in file |
-dryrun | Display the programs invoked by the driver, but do not compile. |
--version -dumpversion | Print the version number. |
-dumpmachine | Print the machine info. |
@file | Read command-line options from file. The options read are inserted in place of the original @file option. |
-Bprefix | Search prefix for Intel compiler executables. The equivalent environmental variable is GCC_EXEC_PREFIX. |
-print-multi-lib | Print the search directories of system libraries |
C/C++ dialect
-ansi -strict-ansi | Strictly ISO C90 standard. In particular, C programs can't use C++ style "//" comments and inline keyword.
__INTEL_STRICT_ANSI__ will be defined. |
-std=s | Determine the language standard. s can be c89, c99, gnu89, gnu++98 (default), c++0x ... |
-openmp | Enable OpenMP. The run-time library libiomp will be used. _OPENMP will be defined. |
-openmp-link static -openmp-link dynamic | Whether OpenMP library should be statically or dynamically linked. |
-openmp-stubs | Compile an OpenMP program into serial code. All OpenMP directives are ignored. |
-openmp-task model | Which OpenMP tasking model to be used. model can be intel or omp (default). |
-openmp-report n | Show diagnostic information when compiling OpenMP programs. n specifies detail levels. |
-openmp-threadprivate type |
Specify the type of threadprivate implementation. type can be legacy (default) or compat (so it will be compatible with other compilers). |
-par-num-threads=n |
The number of threads to use. This option will override
the environmental variable OMP_NUM_THREADS
This option must be used together with either -parallel or -openmp. |
-par-affinity |
Set the threads' CPU affinity. This option will override
the environmental variable KMP_AFFINITY
This option must be used together with either -parallel or -openmp. |
-pthread | Enable pthread support. |
-mkl | Enable Intel Math Kernel library. |
-ipp | Enable Intel Performance Primitives library. |
-use-intel-optimized-headers | Use headers of Intel Performance Primitives library. |
-tbb | Enable Intel Threading Building Blocks library. |
-cilk-serialize | Enable serialization of Intel Cilk Plus code. |
-funsigned-char -fsigned-char | Whether by default char is signed or unsigned. |
Fortran specific
-heap-arrays 0 | Put automatic arrays on heap instead of stack. If your Fortran code crashes because of large arrays, this could help. |
Interoperability with GCC
-gcc-name=dir -cxxlib[=dir] | When used together, specify the full path of gcc C++ libraries. -cxxlib is on by default Alternatively, one can set environmental variables GXX_ROOT and GXX_INCLUDE |
-gcc-version=nnn | Compatibility with gcc version nnn, e.g. 340 means gcc 3.4 |
-gxx-name=dir | Specify the full path of g++ |
Preprocessor
-Dname -Dname=value |
Predefine the macro name, with value 1, or with the specified value |
-Uname | Un-define the (built-in or -D defined) macro name |
-M -MM | Output a rule (to stdout) suitable for Makefile describing the dependencies of the source file.
-MM only outputs header files not in the system header directories. This option implies -E option. |
-MF file | The output of -M is written to file. This can also be controlled by environmental variable DEPENDENCIES_OUTPUT. |
-MD | The same as -M -MF combined, but doesn't imply -E. *.d files will be generated. |
-Wp,opt | Pass opt to the C preprocessor. |
Warning messages
-Wall | Enable all warnings. |
-w | Suppress all warnings. |
-Werror | Treat warnings as errors. |
-Winline | Warn if a function can't be inlined by compiler but is declared as such in the program. |
Link
-Ldir | Also search dir for library files. This can also be controlled by environmental variable LIBRARY_PATH. |
-llibrary | Link to liblibrary The linker searches libraries and object files in the order they are specified, so foo.o -lz bar.o will search library z after file foo.o but before bar.o, so if bar.o refers to functions in z, then -lz must appear AFTER bar.o |
-s | Remove all symbol information from the executable
|
-static | Produce statically linked executable
|
-shared -fPIC -rdynamic |
Produce shared libraries. For details, see here.
-rdynamic is needed for some uses of dlopen or to allow obtaining backtraces from within a program. |
-nostartfiles | Don't link to the standard startup files (so the start point of a program is not main, but _start). To compile crt1.o, one has to use this option. Also see here for examples. |
-nodefaultlibs | Don't link to the standard system libraries (e.g. libgcc.a). |
-nostdlib | Don't link to the standard system libraries (e.g. libgcc.a) or startup files. |
-static-libgcc -shared-libgcc | Whether libgcc should be statically or dynamically linked. |
-static-intel -shared-intel | Whether Intel-provided libraries should be statically or dynamically linked. |
-Wl,opt | Pass opt to the linker. |
-Wl,-t -Wl,--print-map |
Enable linker to output trace/link map information. |
-Wl,--start-group -Wl,--end-group |
All the options between this pair are passed to the linker. |
Debugging
-g | Produce debugging information. |
-trapuv | Initializes stack local variables to an unusual value to aid error detection. |
-debug all | Produce complete debugging information. |
-debug parallel | Produce debugging information for the thread data sharing and reentrant call detection of the Intel Parallel Debugger Extension. |
-diag-enable sc-parallel | Enables analysis of parallelization in source code (parallel lint diagnostics). |
-diag-enable group | Enables messages of diagnostic group, which can be vec, thread, par, openmp, driver, ... |
-save-temps | Save all temporary/intermediate files produced during compiling. |
-parallel-source-info | Emit source code location when OpenMP or auto-parallelization code is generated. |
Profiling
-openmp-profile | Produce profiling information for OpenMP. A text file named guide.gvs
("Generated Values and Statistics". The file name can be specified through KMP_STATSFILE environmental variable)
will be created after the run.
See here for the explanation of guide.gvs. |
-prof-gen=srcpos | Use together with codecov for code coverage analysis. |
Optimization
-O0 -Od | Don't optimize. |
-O1 | Optimize. When any optimization option is used, __OPTIMIZE__ is defined. |
-O -O2 | Optimize even more. This is default. |
-O3 | Optimize yet more. In particular, it will try to inline a function whenever possible. |
-Os | Optimize for code size. This enables all -O2 optimizations that don't increase code size. |
-fast | The same as -ipo -O3 -no-prec-div -static -xHost Note the -static part. |
-opt-report
-opt-report-level n | Show diagnostic information
about optimization during compilation.
n specifies detail levels, default is 1. |
-fp-model model | If model is fast=1 or fast=2
then optimize floating-point arithmetic aggressively at cost
in accuracy or consistency.
Other possible values for model are precise, except, strict, source (round intermediate results to source-defined precision), double (round intermediate results to double precision), and extended (round intermediate results to extended precision). |
-pcn | Round the significand to n bits, n can be 32, 64, 80. |
-fp-speculation=mode | Enable
compiler to speculate on floating-point operations.
mode can be fast (default mode if any optimization is on), safe, strict, or off. |
-fp-port | Round floating-point results after floating-point operations. |
-fp-stack-check | Generate extra code after every function call to ensure that the FPU register stack is in the expected state. |
-fp-relaxed | Generate fast but less accurate code sequences for math functions (Itanium only). |
-ftz | (Enabled automatically by -O3) Enable the DAZ (Denormals Are Zero) and FTZ/FZ (Flush To Zero) bits in the x87 FPU control word. The effect is denormal results from floating-point calculations will be set to 0, and denormal values used as input to floating-point instruction will be treated as 0. See here for details. |
-fpen |
(Fortran) Set n to 0 to enable
floating-point invalid, divide-by-0, overflow exceptions.
Set n to 3 (default) to disable all floating-point exceptions. |
-fast-transcendentals | Generate fast but less accurate code for transcendental functions. |
-prec-div -no-prec-div | (Fortran) Whether or not to generate slow but more accurate code for floating-point divide. |
-prec-sqrt -no-prec-sqrt | (Fortran) Whether or not to generate slow but more accurate code for floating-point square root. |
-unrolln | Unroll the loop at most n times.
To disable, use n=0. |
-prof-gen -prof-use -prof-dir | Profile guided optimization (PGO). |
-ipo | Link time/Inter-procedural optimization. |
-parallel -par-threshold n | Automatically paralellize loops.
n sets the maximum number of threads, and the default is 75. |
-par-report n | Show diagnostic information
about automatic loop parallelization during compilation.
n specifies detail levels. |
-guide n | Provide guidance/advice for automatic vectorization/parallelization/data transformation
when -parallel option is used.
n sets the verbose-ness. |
-vec -vec-threshold n | Automatically vectorize loops.
n sets the maximum number of threads, and the default (which is also the maximum) is 100. |
-vec-report n | Show diagnostic information
about loop vectorization during compilation.
n specifies detail levels, default is 1. |
-march=cpu | Generate code for specific cpu, which can be core2, pentium4, or pentium3. |
-mtune=cpu | Tune for specific cpu, e.g. core2, pentium4, pentium, pentiumpro, pentium-mmx, itanium, itanium2, etc.
The default is pentium4 on x86_64 and itanium2 on IA64. |
-xsimd -axsimd | Generate code for specific SSE/SIMD extensions. simd can be avx, sse4.2, sse4.1, ssse3, sse3, sse3_atom, sse2. -ax is similar to -x, except it also generates the non-SIMD-specific code (a=automatic processor dispatch). |
-minstruction=movbe | Generate movbe instruction if needed. |
-opt-calloc | Use optimized calloc call. |
-opt-sub-script-in-range | Assume loop indices never overflow. |
-opt-matmul | Identify matrix multiplication loop nests (if any) and replace them with a matmul library call for improved performance. |
Interesting features
-sox | Record compiler version number and command-line options in the generated objects.
To see them, use the following commands objdump -sj .comment a.out strings -a a.out |grep comment: |
-fstack-protector -fstack-protector-all | Enable protection against buffer overflows such as stack smashing attacks. |
Environmental variables
The following environmental variables will affect compilation:
GXX_ROOT GXX_INCLUDE |
Specify the location of the gcc binaries/headers. |
IA32ROOT IA64ROOT |
Specify the location of the headers/libraries for a non-standard installation structure. |
ICCCFG ICPCCFG IFORTCFG |
Specify the compiler-default-options files. |
OpenMP environmental variables
The following environmental variables will affect OpenMP programs.KMP-prefix ones are Intel specific environmental variables (K=KAI, Kuck & Associates, Inc)
OMP-prefix ones (not shown here) are standard OpenMP environmental variables. See the previous link for the complete list.
Also see here for Intel extension routines to OpenMP.
KMP_SETTINGS | [Version 12] Set to 1 to display OpenMP run-time library environmental variables. |
KMP_AFFINITY | Set the threads' CPU affinity. See here for details. |
KMP_ALL_THREADS | Maximum number of simultaneously executing threads. |
KMP_BLOCKTIME | The time, in milliseconds, that a thread should wait, after completing the execution of a parallel region, before sleeping. The default is 200. Can be set to infinity. |
KMP_DYNAMIC_MODE | How to choose the number of threads when OMP_DYNAMIC environmental variable is set to true.
If the value is load_balance (default), then it tries to avoid using more threads than the number of available processors. If the value is thread_limit, then it tries to avoid using more threads than the total number of processors. If the value is asat, then it is based on parallel start time. |
KMP_LIBRARY | OpenMP run-time library execution mode.
If the value is throughput (default), then it is optimized for sharing resources with other programs. If the value is turnaround, then it is for dedicated use of resources, as in HPC. If the value is serial, then it enforces serial execution. |
KMP_STACKSIZE | The number of bytes (e.g. 2m, 4k) allocated for each OpenMP thread to use as the private stack for the thread. |
KMP_STATSFILE | The file name for OpenMP profiling option -openmp-profile
The default name is guide.gvs |
KMP_CPUINFO_FILE | Specify an alternate file name for file containing machine topology description
The default is /proc/cpuinfo |
Math Kernel Library environmental variables
The following environmental variables will affect programs using Intel Math Kernel Library. Most of them are thread control, i.e. they will only affect programs linked to multi-threaded version of MKL.
MKL_NUM_THREADS OMP_NUM_THREADS |
Set the number of threads. |
MKL_DYNAMIC | Set to FALSE to disable automatic selection of number of threads. The default is TRUE. Note that MKL will by default ignore the logical cores created by the HyperThreading technology. |
MKL_ALL MKL_BLAS MKL_FFT MKL_VML |
Set the number of threads for each function domain. Alternatively, one can also use environmental variable MKL_DOMAIN_NUM_THREADS |
MKL_SERIAL | For older versions of MKL, set it to YES to disable multithreading. |
MKL_DISABLE_FAST_MM MKL_MM_DISABLE |
Set to any value to disable fast memory management; it will cause memory to be allocated and freed from call to call, which will negatively impact performance of routines such as the level 3 BLAS functions (*gemm), especially for small problem sizes. |
I_MPI_NUMBER_OF_MPI_PROCESSES_PER_NODE I_MPI_THREAD_LEVEL |
Although these are Intel-MPI environmental variable, MKL will read their values to determine the optimal number of threads to be used. |
MKL_DEBUG_CPU_TYPE | (Undocumented) Set to an integer between 0 and 4, inclusive,
to choose the MKL math functions optimized for a specific SSE instruction set. This setting
overrides MKL's Processor Dispatch.
For 64-bit, 0 to choose the default SSE, 1 for SSE 2, 2 for Supplemental SSE 3, 3 for SSE 4.2, and 4 for AVX. For 32-bit, 0 the default SSE, 1 for SSE2, 2 for SSE 3, 3 for Supplemental SSE 3, 4 for SSE 4.2, and 5 for AVX. This setting has substantial performance impact on computational intensive BLAS functions (e.g. *gemm) for large problem sizes. Also, if your CPU does not support the SSE instruction set you specified, you could get "illegal instruction" error. |
MKL_DEBUG_CPU_MA | (Undocumented) (Tested in version 10.2.5 and 10.3)
Set to an integer to choose the MKL
math functions optimized for a specific MicroArchitecture:
32 for Merom, 33 for Penryn, 64 for Nehalem, 66 for Westmere, 128 for
Sandy Bridge, and 0 for everything else (including AMD Barcelona).
Currently this flag is only used to distinguish between Nehalem and Westmere, that is, when MKL_DEBUG_CPU_TYPE is set for SSE 4.2, and only BLAS functions *gemm use it (and only use it on very large problem size). The performance difference is not that great though (1-2%). For all other cases, Intel MKL infers the MicroArchitecture from the SSE instruction set and ignores MKL_DEBUG_CPU_MA setting. For example, AVX instruction means the MicroArchitecture is Sandy Bridge, SSE4.1 means Penryn (both Meron and Penryn support Supplemental SSE 3), etc. |
Intel C/C++ Compiler built-in macros
__cplusplus | Is defined if C++ compiler is in use. | |
__FILE__ __BASE_FILE__ |
Name of the current input file (as a C string constant)
This is ANSI C standard macro. |
|
__LINE__ | Current input line number (as an integer constant)
This is ANSI C standard macro |
|
__DATE__ __TIME__ |
Date & time on which the preprocessor is run. (as C string constants)
These are ANSI C standard macros. | |
__TIMESTAMP__ | Last modification time of the input file (as a C string constant) | |
__STDC__ __STDC_VERSION__ |
Evaluate to 1 to mean the compiler is ISO standard conformant. __STDC_VERSION__ evaluates to a C string constant of the form of the form yyyymmL. __STDC__ is an ANSI C standard macro. |
|
__PIC__ __pic__ |
Evaluate to 1 if the program is compiled with -fPIC flag, i.e. position-independent code.. | |
__GNUC__ __GNUC_MINOR__ __GNUC_PATCHLEVEL__ |
Evaluate to integer constants representing the GNU (C/C++/Fortran) compiler version numbers (major/minor/patch level). | |
__ICC __VERSION__ __ECC |
Evaluate to integer constants representing the Intel C/C++
compiler version numbers.
__ECC is for Itanium only. |
|
__INTEL_COMPILER_BUILD_DATE | Evaluate to the Intel C compiler build date (yyyymmdd format). | |
_OPENMP | Is defined if OpenMP is in effect. | |
KMP_VERSION_MAJOR KMP_VERSION_MINOR KMP_VERSION_BUILD |
Evaluate to integer constants representing the Intel OpenMP version numbers and build date (yyyymmdd format). | |
__itanium__ __ia64__ |
Defined for Itanium. | |
__LP64__ | Defined for Linux x86_64 and Itanium. | |
__x86_64__ | Defined for x86_64. | |
__SSE__ __SSE2__ __SSE3__ __SSSE3__ |
Defined for processors that supports SSE/SSE2... instructions. | |
__OPTIMIZE__ __OPTIMIZE_SIZE__ |
Is defined if any optimization flag is used. Furthermore, __OPTIMIZE_SIZE__ is defined if the optimization is for size, not speed. |
|
__GNUG__ | Evaluate to minor version number of GNU C++ compiler. |
Intel C/C++ Compiler #pragma directives
See here for a list of Intel specific pragmas.
#pragma prefetch var1,var2... | Prefetch data in variables var1,var2... |
#pragma loop_count n1,n2... | Specify the possible numbers n1,n2... of iterations for the loop. |
#pragma unroll n | Specify the how many times a loop to be unrolled. |
#pragma unroll_and_jam n | Specify the how to unrolls one or more loops higher in the nest than the innermost loop and fuses/jams the resulting loops back together. |
#pragma swp | Specify the loop to be software-pipelined. |
#pragma poison symbol1 symbol2 ... | symbol1 symbol2 .. are (unquoted string) identifiers which will be removed during compilation. |
#pragma message string | Display string (C string constant) during compilation. |
#pragma weak symbol | Declare symbol as a weak symbol.
A better way to achieve this is through function attribute "weak". |
#pragma weak symbol1=symbol2 | Declare symbol1 as a weak alias of symbol2.
A better way to achieve this is through function attributes "weak, alias". |
Intel supplied libraries
i*libFNP | FlexNet Publisher License server manager for Intel C, C++, and Fortran compilers |
libcilkrts | Intel Cilk Plus run-time library |
libclompc libclusterguide | Intel Cluster OpenMP run-time library |
libcxaguard | For guarded initiailization of static variables in C++ code. For details, see the comments in libstdc++-v3/libsupc++/guard.cc in GCC source tree and here |
libdecimal | IEEE 754-2008 Decimal floating-point arithmetic. |
libguide | Intel's legacy OpenMP run-time library. It also implements Intel's extensions (i.e. KMP_-prefix environmental variables)
Why it's called "Guide" ? Because the OpenMP-enabled KAI compiler is called Guide Compiler (e.g. guidec) |
libguide_stats | libguide for the parallelizer tool with performance statistics and profile information |
libimf | Intel's equivalent of libm.a |
libiomp5 libiompprof5 libiompstubs5 libompstub | Intel's new OpenMP run-time library. Programs compiled with non-Intel compilers can be linked to it.
The stub libraries are used when -openmp-stubs option is used (compile an OpenMP program into serial code) |
libipgo | Support library for profile-guided optimization. |
libirc | Intel optimized run-time library. It contains the infamous Processor Dispatch code. |
libirc_s | Like above, but contains SSE-specific code. |
libirml | Intel Resource Management Layer library. It is a work dispatcher used by Threading Building Blocks (TBB). |
libpdbx | Parallel debugger extension runtime library. |
libsvml | Short vector math library. |
libifcore libifcoremt libifcore_pic libifcoremt_pic | Fortran run-time library. The mt version are for multi-threaded programs. The _pic version allows creation of shared libraries linked to static version of libifcore instead of the dynamic one. |
libifport | Portability & POSIX support (Fortran). |
libmkl_blacs_intelmpi20_* | BLACS routines using Intel MPI 2.x |
libmkl_blacs_intelmpi_* | BLACS routines using Intel MPI 1.x or MPICH1/2 |
libmkl_blacs_*lp64 | BLACS routines |
libmkl_blacs_openmpi_* | BLACS routines using Open MPI |
libmkl_blacs_sgimpt_* | BLACS routines using SGI Message Passing Toolkit |
libmkl_cdft_core | Cluster discrete Fourier tansform routines |
libmkl_core | Math Kernel Library core functions |
libmkl_gf libmkl_gf_*lp | MKL interface library for GNU Fortran compiler |
libmkl_gnu_thread | MKL interface library for GNU OpenMP |
libmkl_intel libmkl_intel_*lp | MKL interface library for Intel compiler |
libmkl_intel_sp2dp | MKL interface library supporting Cray-style naming in
user programs targeted for the Intel 64 or IA-64 architecture and using the ILP64
convention. SP2DP interface provides a mapping between single-precision names
(for both real and complex types) in the user program and double-precision names in
Intel MKL BLAS and LAPACK. Function names are mapped as shown in the following
example for BLAS functions *gemm
sgemm -> dgemm dgemm -> dgemm cgemm -> zgemm zgemm -> zgemm |
libmkl_intel_thread | MKL interface library for Intel OpenMP |
libmkl_lapack | LAPACK routines |
libmkl_def | Default, untuned MKL, for SSE capable processors. |
libmkl_p4n | MKL for SSE 2 capable processors , i.e. Pentium 4 or better. |
libmkl_mc | MKL for Supplemental SSE 3 capable processors, i.e Core or better. |
libmkl_mc3 | MKL for SSE4.2 capable processors, i.e Nehalem or better. |
libmkl_avx | MKL for AVX capable processors, i.e. Sandy Bridge or better. |
libmkl_pgi_thread | MKL interface library for PGI OpenMP |
libmkl_scalapack | ScaLAPACK routines |
libmkl_sequential | Sequential version of MKL |
libmkl_solver | Iterative sparse solver, trust region solver, and GNU Multiple Precision (GMP) Arithmetic Library routines |
libmkl_vml_* | Vector math library (*=avx is for AVX, mc3/p4m3 for SSE 4.2, mc2/p4m2 for SSE 4.1, mc/p4m for Supplemental SSE3, p4p for SSE3, p4/p4n for Pentium 4) |
libmpi_mt libmpi_dbg_mt | Thread-safe MPI library (dbg=with extra debugging information) |
libmpigc3 libmpigc4 | MPI interface library for GCC 3.x/4.x compilers |
libmpigf | MPI interface library for GNU Fortran compiler |
libmpi_lustre libmpi_panfs libmpi_pvfs2 | MPI IO library tuned for Lustre, Panasas, and PVFS parallel file systems |
libtmi libtmip_mx libtmip_psm | Tag Matching Interface support for Qlogic PSM and Myricom MX interconnects |
libtvmpi | MPI debugging interface library for TotalView debugger |