PGI compiler suite reference card

PGI Compiler Suite

pgccC compiler driver.
pgCCC++ compiler driver.
pgf95
pgf77
pgfortran
Fortran compiler driver.
pghpfHigh Performance Fortran compiler driver.
pgdbgDebugger.
pgcollect
pgprof
Profiler.
pgcpuidDisplay the CPU type the compiler sees and display the default -tp switch it will use.
pgaccelinfoDisplay the accelerator GPU the compiler sees.

File extensions

.cC source files.
.f/for/f90/f95Fortran source files.
.F/FOR/F90/F95Fortran source files (containing macros) to be processed by the Fortran processor.
.hpfHigh Performance Fortran source files.
.cufFortran source files with CUDA extensions.
.CUFFortran source files with CUDA extensions to be processed by the Fortran processor.
.hC/C++ header files.
.iPreprocessed C source files.
.C/ccC++ source files.
.sAssembler code.
.dDependency files. They contain rules suitable for Makefile describing the dependencies of the source file.

Created by -MD option.

Now the compiler...

Beginning version 7.0, the default compiler options can be placed in the ~/.mypgirc file (for every PGI compiler), ~/.mypgccrc file (for C compiler), ~/.mypgcpprc file (for C++ compiler), ~/.mypgfortranrc file (for Fortran compiler), etc. The file should contain something like:
append PREOPTIONS=-fast;
append POSTOPTIONS=-Mipa;
(Notice the semicolons.) That is, you can set at most two default compiler options, one of which precedes everything in the command-line, and the other follows everything in the command-line. If you have more than one append PREOPTIONS=.. or append POSTOPTIONS=.., only the FIRST occurrence will be used. Also note that you cannot use space in the options. For example, instead of -tp barcelona-64, you must use -tp=barcelona-64. Moreover, not all command-line options can be used. For example, -### is not allowed.

For details, see Technical Problem Report 3985 and here.

Compile

-cCompile *.c and assemble *.s. NO linking.
-IdirAlso search dir for header files.

This can also be controlled by environmental variables C_INCLUDE_PATH and CPLUS_INCLUDE_PATH.

-SCompile *.c into assembly codes *.s. NO linking.
-MannoMake the generated assembly codes more readable.
-ERun preprocessor only. The output is sent to stdout.
-CWhen running preprocessor, don't discard comments in the program.
-dMDisplay definitions of all built-in macros.
-o filePlace output in file
-vWhen compiling, also display the programs invoked by the compiler.
-dryrun
-###
Display the programs invoked by the driver and exit.
-drystdinDisplay standard header directories and exit.
-showDisplay detailed information of current driver.
-VDisplay the version number.
-#When compiling, also display the programs invoked by the compiler.
-help=hiddenDisplay all available compiler switches, including the hidden & undocumented ones (Yes, PGI has many of them!)

C/C++ dialect

-AFollow strict ANSI C++ standard.
-aFollow proposed ANSI C++ standard.
-BAccept C++ style comments in C code.
--gnu_extensionsAccept GNU extensions.
-mp
-mp=mode
Enable OpenMP.

mode can be align, allcores, bind, nonuma, numa (use thread-CPU affinity).

Preprocessor

-Dname
-Dname=value
Predefine the macro name, with value 1, or with the specified value
-UnameUn-define the (built-in or -D defined) macro name
-M
-MM
Output a rule (to stdout) suitable for Makefile describing the dependencies of the source file.

-MM only outputs header files not in the system header directories.

This option implies -E option.

-MDThe same as -M, but *.d files will be generated.
-MMDThe same as -MM, but *.d files will be generated.

Warning messages

-Minform=warnShow warning messages.
-wSuppress all warnings.

Link

-LdirAlso search dir for library files. This can also be controlled by environmental variable LIBRARY_PATH.
-llibraryLink to liblibrary

The linker searches libraries and object files in the order they are specified, so

    foo.o -lz bar.o

will search library z after file foo.o but before bar.o, so if bar.o refers to functions in z, then -lz must appear AFTER bar.o

-sRemove all symbol information from the executable

-BstaticProduce statically linked executable

-shared
-fPIC
-r
Produce shared libraries. For details, see here.
-MnostartupDon't link to the standard startup files (so the start point of a program is not main, but _start).

To compile crt1.o, one has to use this option.

Also see here for examples.

-MnostdlibDon't link to the standard system libraries (e.g. libgcc.a) or startup files.
-Bstatic_pgi
-Bdynamic
Whether PGI-provided libraries should be statically or dynamically linked.
-pgcpplibs
-pgf77libs
-pgf90libs
Link to C++, PGF77, or PGF90 runtime libraries.
-Mmpi=mpilibLink to MPI library.

mpilib can be mpich1, mpich2, hpmpi, mvapich1.

-MscalapackLink to ScaLAPACK library.
-Rdir Tell linker to add dir to the runtime shared/dynamic libraries search path.
-Wl,optPass opt to the linker.
-rpath=dirTell linker to add dir to the runtime shared/dynamic libraries search path.
-m Enable linker to output trace/link map information.
-Wl,--start-group
-Wl,--end-group
All the options between this pair are passed to the linker.

Debugging

-gProduce debugging information.
-goptProduce debugging information in the presence of optimization.
-MkeepasmSave all temporary/intermediate assembly files produced during compiling.
-tracebackAdd debug information for runtime traceback. Should be used together with -Meh_frame

Set the environmental variable PGI_TERM to trace to enable the stack trace back on error.

Profiling

-pgProduce profiling information for pgprof.
-Mprof=optionProduce profiling information for pgprof. option can be func, hwcts (PAPI must be installed), lines, mpich1 mpich2, mvapich1

Optimization

-O0Don't optimize.
-O1Optimize.
-O2Optimize even more.

This is default.

-O3
-O4
Optimize yet more.
-fast This implies -O2 and other optimizations such as loop unrolling, SSE instructions, loop redundancy elimination (LRE), partial redundancy elimination (PRE), Flush To Zero (FTZ) & Denormals Are Zero (DAZ) modes, etc.
-Msmart Invoke a post-pass assembly instruction scheduling optimization.
-Mdaz Treat denormal values used as input to floating-point instruction as 0.
-Mflushz Set denormal results from floating-point calculations to 0.
-Mfprelaxed Generate fast but less accurate code for math functions (division, reciprocal, square root, reciprocal square root, etc)
-Mfapprox Generate fast but low-precision code for math functions (division, reciprocal, reciprocal square root)
-Kieee Perform floating-point operations in strict conformance with the IEEE 754 standard. Some optimizations are disabled.
-Minline Enable function inlining.
-Mipa=fast,inline Link time/Inter-procedural optimization.
-Minfo
-Minfo=lvl
Display compile-time optimization information

lvl can be all, ccff, ftn, ipa, loop, lre, mp, opt, par, pfo, unroll, vect..

Note: CCFF means "Common Compiler Feedback Format"

-Mneginfo Display messages why certain optimizations are disabled during compile-time.
-Mchkfpstk Generate extra code after every function call to ensure that the FPU register stack is in the expected state.
-Msmartalloc=huge Link to the huge page runtime library.
-Mpfi
-Mpfo
Profile guided optimization (PGO).
-MconcurAutomatically paralellize loops.
-MvectAutomatically vectorize loops.
-tp cpuGenerate code for specific cpu, e.g. athlon, barcelona, barcelona-64, core2-64, istanbul-64, nehalem-64, p7-64, penryn-64, shanghai-64 ...
-help=targetList all cpu which can be used in "-tp cpu" switch.
-ta=nvidia,sub_optionsGenerate code for NVIDIA accelerator with specific sub_options, e.g. cc20, cuda2.3, cuda3.0, fastmath...
-pc=nRound the significand to n bits, n can be 32, 64, 80.
-W0,-beta -#(Undocumented) Enable beta release optimizations.

Miscellaneous features

-MchkstkGenerate code to check for sufficient stack stack upon subprogram entry.
-MboundsGenerate code to check array bounds
-Mbyteswapio(Fortran) Swap byte-order (big-endian to little-endian or vice versa) during I/O of Fortran unformatted data.
-Mchkptr(Fortran) Check for NULL pointers.
-Mcray(Fortran) Enable Cray compatibility mode.
-Mcuda
-Mcuda=emu
(Fortran) Enable CUDA Fortran.

Enable emulation mode.

Run-time environmental variables

In addition to the standard OpenMP run-time environmental variables, the following variables also affect run-time behavior of PGI-compiled programs.

NCPUS (OpenMP) Specify the number of processes or threads used in parallel regions.
NCPUS_MAX (OpenMP) Specify the maximal number of processes or threads used in parallel regions.
MP_SPIN (OpenMP) Specify the number of times to check a semaphore before calling sched_yield (on Linux or Mac OS X) or _sleep (on Windows).
MP_BIND (OpenMP) Set to y to use thread-CPU affinity (binding processes or threads to a physical core/processor).
MP_BLIST (OpenMP) If MP_BIND is set to y, this variable specifically defines the thread-CPU relationship, overriding the default values.
MPSTKZ (OpenMP) Specify the number of bytes (e.g. 2m, 4m) allocated for each thread to use as the private stack for the thread.
PGI_HUGE_PAGES Specify the number of huge pages (2 MB).

The purpose of huge pages is to reduce TLB cache misses.

ACML_FAST_MALLOC Set to 1 to use optimized memory management for the BLAS function dgemm in ACML.

This is a new feature introduced in ACML version 4.4.0.

ACML_FAST_MALLOC_CHUNK_SIZE
ACML_FAST_MALLOC_MAX_CHUNKS
These two parameters further fine tune the behavior of ACML_FAST_MALLOC. By default the limit is set to 64 chunks of size 10,000,000 bytes.
ACML_FAST_MALLOC_DEBUG Set to any value to dislpay the debugging information of ACML_FAST_MALLOC.
NO_STOP_MESSAGE (Fortran) Set to any value to disable FORTRAN STOP message when STOP is called.
FORTRANOPT (Fortran) This controls Fortran I/O behavior. Its value is a comma-separated list options, which can be:
  • vaxio: Use VAX I/O conventions.
  • crlf: Interpret DOS/Windows style \r\n (carriage return and line feed) as new line.
  • format_relaxed: An I/O item corresponding to a numerical edit descriptor (such as F, E, I) is not required to be a type implied by the descriptor.
PGI_TERM This controls the stack trace-back and just-in-time debugging. Its value is a comma-separated list options, which can be:
  • debug: Invoke the debugger on error. By default it invokes pgdbg, but can be set to use other debuggers. See PGI_TERM_DEBUG environmental variable.
  • trade: Enable stack trace-back.
  • abort: Enable core dump when abort is called.
Each option can be disabled (which is default) by attaching no to it, e.g. noabort.
PGI_TERM_DEBUG This controls how the debugger is invoked. For example, it can be set to
gdb --quiet --pid %d
to use GDB instead.
PGI_STACK_USAGE
STAKSTAT
Set to any value to dislpay the stack usage when the program ends.

PGI C Compiler built-in macros

__cplusplus Is defined if C++ compiler is in use.
__FILE__
__BASE_FILE__
Name of the current input file (as a C string constant)

This is ANSI C standard macro.

__LINE__ Current input line number (as an integer constant)

This is ANSI C standard macro

__DATE__
__TIME__
Date & time on which the preprocessor is run. (as C string constants)

These are ANSI C standard macros.

__TIMESTAMP__ Last modification time of the input file (as a C string constant)
__STDC__
__STDC_VERSION__
Evaluate to 1 to mean the compiler is ISO standard conformant.

__STDC_VERSION__ evaluates to a C string constant of the form of the form yyyymmL.

__STDC__ is an ANSI C standard macro.

__PGIC__
__PGIC_MINOR__
__PGIC_PATCHLEVEL__
Evaluate to integer constants representing the PGI compiler version numbers (major/minor/patch level).
__PGI Defined for PGI compiler.
__x86_64__
__amd64__
Defined for x86_64.
__MMX__
__SSE__
__SSE2__
__SSE3__
__SSSE3__
Defined for processors that supports MMX/SSE/SSE2... instructions.