PGI Compiler Suite
| pgcc | C compiler driver. |
| pgCC | C++ compiler driver. |
| pgf95 pgf77 pgfortran | Fortran compiler driver. |
| pghpf | High Performance Fortran compiler driver. |
| pgdbg | Debugger. |
| pgcollect pgprof | Profiler. |
| pgcpuid | Display the CPU type the compiler sees and display the default -tp switch it will use. |
| pgaccelinfo | Display the accelerator GPU the compiler sees. |
File extensions
| .c | C source files. |
| .f/for/f90/f95 | Fortran source files. |
| .F/FOR/F90/F95 | Fortran source files (containing macros) to be processed by the Fortran processor. |
| .hpf | High Performance Fortran source files. |
| .cuf | Fortran source files with CUDA extensions. |
| .CUF | Fortran source files with CUDA extensions to be processed by the Fortran processor. |
| .h | C/C++ header files. |
| .i | Preprocessed C source files. |
| .C/cc | C++ source files. |
| .s | Assembler code. |
| .d | Dependency files. They contain rules suitable for Makefile describing the dependencies of the source file.
Created by -MD option. |
Now the compiler...
Beginning version 7.0, the default compiler options can be placed in the ~/.mypgirc file (for every PGI compiler), ~/.mypgccrc file (for C compiler), ~/.mypgcpprc file (for C++ compiler), ~/.mypgfortranrc file (for Fortran compiler), etc. The file should contain something like:append PREOPTIONS=-fast; append POSTOPTIONS=-Mipa;(Notice the semicolons.) That is, you can set at most two default compiler options, one of which precedes everything in the command-line, and the other follows everything in the command-line. If you have more than one append PREOPTIONS=.. or append POSTOPTIONS=.., only the FIRST occurrence will be used. Also note that you cannot use space in the options. For example, instead of -tp barcelona-64, you must use -tp=barcelona-64. Moreover, not all command-line options can be used. For example, -### is not allowed.
For details, see Technical Problem Report 3985 and here.
Compile
| -c | Compile *.c and assemble *.s. NO linking. |
| -Idir | Also search dir for header files. This can also be controlled by environmental variables C_INCLUDE_PATH and CPLUS_INCLUDE_PATH. |
| -S | Compile *.c into assembly codes *.s. NO linking. |
| -Manno | Make the generated assembly codes more readable. |
| -E | Run preprocessor only. The output is sent to stdout. |
| -C | When running preprocessor, don't discard comments in the program. |
| -dM | Display definitions of all built-in macros. |
| -o file | Place output in file |
| -v | When compiling, also display the programs invoked by the compiler. |
| -dryrun -### | Display the programs invoked by the driver and exit. |
| -drystdin | Display standard header directories and exit. |
| -show | Display detailed information of current driver. |
| -V | Display the version number. |
| -# | When compiling, also display the programs invoked by the compiler. |
| -help=hidden | Display all available compiler switches, including the hidden & undocumented ones (Yes, PGI has many of them!) |
C/C++ dialect
| -A | Follow strict ANSI C++ standard. |
| -a | Follow proposed ANSI C++ standard. |
| -B | Accept C++ style comments in C code. |
| --gnu_extensions | Accept GNU extensions. |
| -mp -mp=mode |
Enable OpenMP.
mode can be align, allcores, bind, nonuma, numa (use thread-CPU affinity). |
Preprocessor
| -Dname -Dname=value |
Predefine the macro name, with value 1, or with the specified value |
| -Uname | Un-define the (built-in or -D defined) macro name |
| -M -MM | Output a rule (to stdout) suitable for Makefile describing the dependencies of the source file.
-MM only outputs header files not in the system header directories. This option implies -E option. |
| -MD | The same as -M, but *.d files will be generated. |
| -MMD | The same as -MM, but *.d files will be generated. |
Warning messages
| -Minform=warn | Show warning messages. |
| -w | Suppress all warnings. |
Link
| -Ldir | Also search dir for library files. This can also be controlled by environmental variable LIBRARY_PATH. |
| -llibrary | Link to liblibrary The linker searches libraries and object files in the order they are specified, so foo.o -lz bar.o will search library z after file foo.o but before bar.o, so if bar.o refers to functions in z, then -lz must appear AFTER bar.o |
| -s | Remove all symbol information from the executable
|
| -Bstatic | Produce statically linked executable
|
| -shared -fPIC -r |
Produce shared libraries. For details, see here. |
| -Mnostartup | Don't link to the standard startup files (so the start point of a program is not main, but _start). To compile crt1.o, one has to use this option. Also see here for examples. |
| -Mnostdlib | Don't link to the standard system libraries (e.g. libgcc.a) or startup files. |
| -Bstatic_pgi -Bdynamic | Whether PGI-provided libraries should be statically or dynamically linked. |
| -pgcpplibs -pgf77libs -pgf90libs | Link to C++, PGF77, or PGF90 runtime libraries. |
| -Mmpi=mpilib | Link to MPI library. mpilib can be mpich1, mpich2, hpmpi, mvapich1. |
| -Mscalapack | Link to ScaLAPACK library. |
| -Rdir | Tell linker to add dir to the runtime shared/dynamic libraries search path. |
| -Wl,opt | Pass opt to the linker. |
| -rpath=dir | Tell linker to add dir to the runtime shared/dynamic libraries search path. |
| -m | Enable linker to output trace/link map information. |
| -Wl,--start-group -Wl,--end-group |
All the options between this pair are passed to the linker. |
Debugging
| -g | Produce debugging information. |
| -gopt | Produce debugging information in the presence of optimization. |
| -Mkeepasm | Save all temporary/intermediate assembly files produced during compiling. |
| -traceback | Add debug information for runtime traceback. Should be used together
with -Meh_frame Set the environmental variable PGI_TERM to trace to enable the stack trace back on error. |
Profiling
| -pg | Produce profiling information for pgprof. |
| -Mprof=option | Produce profiling information for pgprof. option can be func, hwcts (PAPI must be installed), lines, mpich1 mpich2, mvapich1 |
Optimization
| -O0 | Don't optimize. |
| -O1 | Optimize. |
| -O2 | Optimize even more. This is default. |
| -O3 -O4 | Optimize yet more. |
| -fast | This implies -O2 and other optimizations such as loop unrolling, SSE instructions, loop redundancy elimination (LRE), partial redundancy elimination (PRE), Flush To Zero (FTZ) & Denormals Are Zero (DAZ) modes, etc. |
| -Msmart | Invoke a post-pass assembly instruction scheduling optimization. |
| -Mdaz | Treat denormal values used as input to floating-point instruction as 0. |
| -Mflushz | Set denormal results from floating-point calculations to 0. |
| -Mfprelaxed | Generate fast but less accurate code for math functions (division, reciprocal, square root, reciprocal square root, etc) |
| -Mfapprox | Generate fast but low-precision code for math functions (division, reciprocal, reciprocal square root) |
| -Kieee | Perform floating-point operations in strict conformance with the IEEE 754 standard. Some optimizations are disabled. |
| -Minline | Enable function inlining. |
| -Mipa=fast,inline | Link time/Inter-procedural optimization. |
| -Minfo -Minfo=lvl |
Display compile-time optimization information
lvl can be all, ccff, ftn, ipa, loop, lre, mp, opt, par, pfo, unroll, vect.. Note: CCFF means "Common Compiler Feedback Format" |
| -Mneginfo | Display messages why certain optimizations are disabled during compile-time. |
| -Mchkfpstk | Generate extra code after every function call to ensure that the FPU register stack is in the expected state. |
| -Msmartalloc=huge | Link to the huge page runtime library. |
| -Mpfi -Mpfo | Profile guided optimization (PGO). |
| -Mconcur | Automatically paralellize loops. |
| -Mvect | Automatically vectorize loops. |
| -tp cpu | Generate code for specific cpu, e.g. athlon, barcelona, barcelona-64, core2-64, istanbul-64, nehalem-64, p7-64, penryn-64, shanghai-64 ... |
| -help=target | List all cpu which can be used in "-tp cpu" switch. |
| -ta=nvidia,sub_options | Generate code for NVIDIA accelerator with specific sub_options, e.g. cc20, cuda2.3, cuda3.0, fastmath... |
| -pc=n | Round the significand to n bits, n can be 32, 64, 80. |
| -W0,-beta -# | (Undocumented) Enable beta release optimizations. |
Miscellaneous features
| -Mchkstk | Generate code to check for sufficient stack stack upon subprogram entry. |
| -Mbounds | Generate code to check array bounds |
| -Mbyteswapio | (Fortran) Swap byte-order (big-endian to little-endian or vice versa) during I/O of Fortran unformatted data. |
| -Mchkptr | (Fortran) Check for NULL pointers. |
| -Mcray | (Fortran) Enable Cray compatibility mode. |
| -Mcuda -Mcuda=emu | (Fortran) Enable CUDA Fortran. Enable emulation mode. |
Run-time environmental variables
In addition to the standard OpenMP run-time environmental variables, the following variables also affect run-time behavior of PGI-compiled programs.
| NCPUS | (OpenMP) Specify the number of processes or threads used in parallel regions. |
| NCPUS_MAX | (OpenMP) Specify the maximal number of processes or threads used in parallel regions. |
| MP_SPIN | (OpenMP) Specify the number of times to check a semaphore before calling sched_yield (on Linux or Mac OS X) or _sleep (on Windows). |
| MP_BIND | (OpenMP) Set to y to use thread-CPU affinity (binding processes or threads to a physical core/processor). |
| MP_BLIST | (OpenMP) If MP_BIND is set to y, this variable specifically defines the thread-CPU relationship, overriding the default values. |
| MPSTKZ | (OpenMP) Specify the number of bytes (e.g. 2m, 4m) allocated for each thread to use as the private stack for the thread. |
| PGI_HUGE_PAGES | Specify the number of huge pages (2 MB). The purpose of huge pages is to reduce TLB cache misses. |
| ACML_FAST_MALLOC | Set to 1 to use optimized memory management for the BLAS function dgemm
in ACML. This is a new feature introduced in ACML version 4.4.0. |
| ACML_FAST_MALLOC_CHUNK_SIZE ACML_FAST_MALLOC_MAX_CHUNKS |
These two parameters further fine tune the behavior of ACML_FAST_MALLOC. By default the limit is set to 64 chunks of size 10,000,000 bytes. |
| ACML_FAST_MALLOC_DEBUG | Set to any value to dislpay the debugging information of ACML_FAST_MALLOC. |
| NO_STOP_MESSAGE | (Fortran) Set to any value to disable FORTRAN STOP message when STOP is called. |
| FORTRANOPT | (Fortran) This controls Fortran I/O behavior. Its value is a comma-separated list options, which
can be:
|
| PGI_TERM | This controls the stack trace-back and just-in-time debugging. Its value
is a comma-separated list options, which can be:
|
| PGI_TERM_DEBUG | This controls how the debugger is invoked. For example, it can be set to
gdb --quiet --pid %dto use GDB instead. |
| PGI_STACK_USAGE STAKSTAT |
Set to any value to dislpay the stack usage when the program ends. |
PGI C Compiler built-in macros
| __cplusplus | Is defined if C++ compiler is in use. | |
| __FILE__ __BASE_FILE__ |
Name of the current input file (as a C string constant)
This is ANSI C standard macro. |
|
| __LINE__ | Current input line number (as an integer constant)
This is ANSI C standard macro |
|
| __DATE__ __TIME__ |
Date & time on which the preprocessor is run. (as C string constants)
These are ANSI C standard macros. | |
| __TIMESTAMP__ | Last modification time of the input file (as a C string constant) | |
| __STDC__ __STDC_VERSION__ |
Evaluate to 1 to mean the compiler is ISO standard conformant. __STDC_VERSION__ evaluates to a C string constant of the form of the form yyyymmL. __STDC__ is an ANSI C standard macro. |
|
| __PGIC__ __PGIC_MINOR__ __PGIC_PATCHLEVEL__ |
Evaluate to integer constants representing the PGI compiler version numbers (major/minor/patch level). | |
| __PGI | Defined for PGI compiler. | |
| __x86_64__ __amd64__ |
Defined for x86_64. | |
|
__MMX__ __SSE__ __SSE2__ __SSE3__ __SSSE3__ |
Defined for processors that supports MMX/SSE/SSE2... instructions. |