PGI Compiler Suite
pgcc | C compiler driver. |
pgCC | C++ compiler driver. |
pgf95 pgf77 pgfortran | Fortran compiler driver. |
pghpf | High Performance Fortran compiler driver. |
pgdbg | Debugger. |
pgcollect pgprof | Profiler. |
pgcpuid | Display the CPU type the compiler sees and display the default -tp switch it will use. |
pgaccelinfo | Display the accelerator GPU the compiler sees. |
File extensions
.c | C source files. |
.f/for/f90/f95 | Fortran source files. |
.F/FOR/F90/F95 | Fortran source files (containing macros) to be processed by the Fortran processor. |
.hpf | High Performance Fortran source files. |
.cuf | Fortran source files with CUDA extensions. |
.CUF | Fortran source files with CUDA extensions to be processed by the Fortran processor. |
.h | C/C++ header files. |
.i | Preprocessed C source files. |
.C/cc | C++ source files. |
.s | Assembler code. |
.d | Dependency files. They contain rules suitable for Makefile describing the dependencies of the source file.
Created by -MD option. |
Now the compiler...
Beginning version 7.0, the default compiler options can be placed in the ~/.mypgirc file (for every PGI compiler), ~/.mypgccrc file (for C compiler), ~/.mypgcpprc file (for C++ compiler), ~/.mypgfortranrc file (for Fortran compiler), etc. The file should contain something like:append PREOPTIONS=-fast; append POSTOPTIONS=-Mipa;(Notice the semicolons.) That is, you can set at most two default compiler options, one of which precedes everything in the command-line, and the other follows everything in the command-line. If you have more than one append PREOPTIONS=.. or append POSTOPTIONS=.., only the FIRST occurrence will be used. Also note that you cannot use space in the options. For example, instead of -tp barcelona-64, you must use -tp=barcelona-64. Moreover, not all command-line options can be used. For example, -### is not allowed.
For details, see Technical Problem Report 3985 and here.
Compile
-c | Compile *.c and assemble *.s. NO linking. |
-Idir | Also search dir for header files. This can also be controlled by environmental variables C_INCLUDE_PATH and CPLUS_INCLUDE_PATH. |
-S | Compile *.c into assembly codes *.s. NO linking. |
-Manno | Make the generated assembly codes more readable. |
-E | Run preprocessor only. The output is sent to stdout. |
-C | When running preprocessor, don't discard comments in the program. |
-dM | Display definitions of all built-in macros. |
-o file | Place output in file |
-v | When compiling, also display the programs invoked by the compiler. |
-dryrun -### | Display the programs invoked by the driver and exit. |
-drystdin | Display standard header directories and exit. |
-show | Display detailed information of current driver. |
-V | Display the version number. |
-# | When compiling, also display the programs invoked by the compiler. |
-help=hidden | Display all available compiler switches, including the hidden & undocumented ones (Yes, PGI has many of them!) |
C/C++ dialect
-A | Follow strict ANSI C++ standard. |
-a | Follow proposed ANSI C++ standard. |
-B | Accept C++ style comments in C code. |
--gnu_extensions | Accept GNU extensions. |
-mp -mp=mode |
Enable OpenMP.
mode can be align, allcores, bind, nonuma, numa (use thread-CPU affinity). |
Preprocessor
-Dname -Dname=value |
Predefine the macro name, with value 1, or with the specified value |
-Uname | Un-define the (built-in or -D defined) macro name |
-M -MM | Output a rule (to stdout) suitable for Makefile describing the dependencies of the source file.
-MM only outputs header files not in the system header directories. This option implies -E option. |
-MD | The same as -M, but *.d files will be generated. |
-MMD | The same as -MM, but *.d files will be generated. |
Warning messages
-Minform=warn | Show warning messages. |
-w | Suppress all warnings. |
Link
-Ldir | Also search dir for library files. This can also be controlled by environmental variable LIBRARY_PATH. |
-llibrary | Link to liblibrary The linker searches libraries and object files in the order they are specified, so foo.o -lz bar.o will search library z after file foo.o but before bar.o, so if bar.o refers to functions in z, then -lz must appear AFTER bar.o |
-s | Remove all symbol information from the executable
|
-Bstatic | Produce statically linked executable
|
-shared -fPIC -r |
Produce shared libraries. For details, see here. |
-Mnostartup | Don't link to the standard startup files (so the start point of a program is not main, but _start). To compile crt1.o, one has to use this option. Also see here for examples. |
-Mnostdlib | Don't link to the standard system libraries (e.g. libgcc.a) or startup files. |
-Bstatic_pgi -Bdynamic | Whether PGI-provided libraries should be statically or dynamically linked. |
-pgcpplibs -pgf77libs -pgf90libs | Link to C++, PGF77, or PGF90 runtime libraries. |
-Mmpi=mpilib | Link to MPI library. mpilib can be mpich1, mpich2, hpmpi, mvapich1. |
-Mscalapack | Link to ScaLAPACK library. |
-Rdir | Tell linker to add dir to the runtime shared/dynamic libraries search path. |
-Wl,opt | Pass opt to the linker. |
-rpath=dir | Tell linker to add dir to the runtime shared/dynamic libraries search path. |
-m | Enable linker to output trace/link map information. |
-Wl,--start-group -Wl,--end-group |
All the options between this pair are passed to the linker. |
Debugging
-g | Produce debugging information. |
-gopt | Produce debugging information in the presence of optimization. |
-Mkeepasm | Save all temporary/intermediate assembly files produced during compiling. |
-traceback | Add debug information for runtime traceback. Should be used together
with -Meh_frame Set the environmental variable PGI_TERM to trace to enable the stack trace back on error. |
Profiling
-pg | Produce profiling information for pgprof. |
-Mprof=option | Produce profiling information for pgprof. option can be func, hwcts (PAPI must be installed), lines, mpich1 mpich2, mvapich1 |
Optimization
-O0 | Don't optimize. |
-O1 | Optimize. |
-O2 | Optimize even more. This is default. |
-O3 -O4 | Optimize yet more. |
-fast | This implies -O2 and other optimizations such as loop unrolling, SSE instructions, loop redundancy elimination (LRE), partial redundancy elimination (PRE), Flush To Zero (FTZ) & Denormals Are Zero (DAZ) modes, etc. |
-Msmart | Invoke a post-pass assembly instruction scheduling optimization. |
-Mdaz | Treat denormal values used as input to floating-point instruction as 0. |
-Mflushz | Set denormal results from floating-point calculations to 0. |
-Mfprelaxed | Generate fast but less accurate code for math functions (division, reciprocal, square root, reciprocal square root, etc) |
-Mfapprox | Generate fast but low-precision code for math functions (division, reciprocal, reciprocal square root) |
-Kieee | Perform floating-point operations in strict conformance with the IEEE 754 standard. Some optimizations are disabled. |
-Minline | Enable function inlining. |
-Mipa=fast,inline | Link time/Inter-procedural optimization. |
-Minfo -Minfo=lvl |
Display compile-time optimization information
lvl can be all, ccff, ftn, ipa, loop, lre, mp, opt, par, pfo, unroll, vect.. Note: CCFF means "Common Compiler Feedback Format" |
-Mneginfo | Display messages why certain optimizations are disabled during compile-time. |
-Mchkfpstk | Generate extra code after every function call to ensure that the FPU register stack is in the expected state. |
-Msmartalloc=huge | Link to the huge page runtime library. |
-Mpfi -Mpfo | Profile guided optimization (PGO). |
-Mconcur | Automatically paralellize loops. |
-Mvect | Automatically vectorize loops. |
-tp cpu | Generate code for specific cpu, e.g. athlon, barcelona, barcelona-64, core2-64, istanbul-64, nehalem-64, p7-64, penryn-64, shanghai-64 ... |
-help=target | List all cpu which can be used in "-tp cpu" switch. |
-ta=nvidia,sub_options | Generate code for NVIDIA accelerator with specific sub_options, e.g. cc20, cuda2.3, cuda3.0, fastmath... |
-pc=n | Round the significand to n bits, n can be 32, 64, 80. |
-W0,-beta -# | (Undocumented) Enable beta release optimizations. |
Miscellaneous features
-Mchkstk | Generate code to check for sufficient stack stack upon subprogram entry. |
-Mbounds | Generate code to check array bounds |
-Mbyteswapio | (Fortran) Swap byte-order (big-endian to little-endian or vice versa) during I/O of Fortran unformatted data. |
-Mchkptr | (Fortran) Check for NULL pointers. |
-Mcray | (Fortran) Enable Cray compatibility mode. |
-Mcuda -Mcuda=emu | (Fortran) Enable CUDA Fortran. Enable emulation mode. |
Run-time environmental variables
In addition to the standard OpenMP run-time environmental variables, the following variables also affect run-time behavior of PGI-compiled programs.
NCPUS | (OpenMP) Specify the number of processes or threads used in parallel regions. |
NCPUS_MAX | (OpenMP) Specify the maximal number of processes or threads used in parallel regions. |
MP_SPIN | (OpenMP) Specify the number of times to check a semaphore before calling sched_yield (on Linux or Mac OS X) or _sleep (on Windows). |
MP_BIND | (OpenMP) Set to y to use thread-CPU affinity (binding processes or threads to a physical core/processor). |
MP_BLIST | (OpenMP) If MP_BIND is set to y, this variable specifically defines the thread-CPU relationship, overriding the default values. |
MPSTKZ | (OpenMP) Specify the number of bytes (e.g. 2m, 4m) allocated for each thread to use as the private stack for the thread. |
PGI_HUGE_PAGES | Specify the number of huge pages (2 MB). The purpose of huge pages is to reduce TLB cache misses. |
ACML_FAST_MALLOC | Set to 1 to use optimized memory management for the BLAS function dgemm
in ACML. This is a new feature introduced in ACML version 4.4.0. |
ACML_FAST_MALLOC_CHUNK_SIZE ACML_FAST_MALLOC_MAX_CHUNKS |
These two parameters further fine tune the behavior of ACML_FAST_MALLOC. By default the limit is set to 64 chunks of size 10,000,000 bytes. |
ACML_FAST_MALLOC_DEBUG | Set to any value to dislpay the debugging information of ACML_FAST_MALLOC. |
NO_STOP_MESSAGE | (Fortran) Set to any value to disable FORTRAN STOP message when STOP is called. |
FORTRANOPT | (Fortran) This controls Fortran I/O behavior. Its value is a comma-separated list options, which
can be:
|
PGI_TERM | This controls the stack trace-back and just-in-time debugging. Its value
is a comma-separated list options, which can be:
|
PGI_TERM_DEBUG | This controls how the debugger is invoked. For example, it can be set to
gdb --quiet --pid %dto use GDB instead. |
PGI_STACK_USAGE STAKSTAT |
Set to any value to dislpay the stack usage when the program ends. |
PGI C Compiler built-in macros
__cplusplus | Is defined if C++ compiler is in use. | |
__FILE__ __BASE_FILE__ |
Name of the current input file (as a C string constant)
This is ANSI C standard macro. |
|
__LINE__ | Current input line number (as an integer constant)
This is ANSI C standard macro |
|
__DATE__ __TIME__ |
Date & time on which the preprocessor is run. (as C string constants)
These are ANSI C standard macros. | |
__TIMESTAMP__ | Last modification time of the input file (as a C string constant) | |
__STDC__ __STDC_VERSION__ |
Evaluate to 1 to mean the compiler is ISO standard conformant. __STDC_VERSION__ evaluates to a C string constant of the form of the form yyyymmL. __STDC__ is an ANSI C standard macro. |
|
__PGIC__ __PGIC_MINOR__ __PGIC_PATCHLEVEL__ |
Evaluate to integer constants representing the PGI compiler version numbers (major/minor/patch level). | |
__PGI | Defined for PGI compiler. | |
__x86_64__ __amd64__ |
Defined for x86_64. | |
__MMX__ __SSE__ __SSE2__ __SSE3__ __SSSE3__ |
Defined for processors that supports MMX/SSE/SSE2... instructions. |