Consider the following simple C code:
#include <stdio.h> void main() { printf("Hello World\n"); }When compiled with GCC 4.5 -O0 option (i.e. turn off optimizations), the generated assembly code is:
.LC0: .string "Hello World" .text .globl main .type main, @function main: pushq %rbp movq %rsp, %rbp movl $.LC0, %edi call puts leave retIf one compiles the C code again with -fno-builtin option, then the generated assembly code is:
.LC0: .string "Hello World\n" .text .globl main .type main, @function main: pushq %rbp movq %rsp, %rbp movl $.LC0, %eax movq %rax, %rdi movl $0, %eax call printf leave retIf one removes the \n in the original C code and compile with -O0 option, then the generated assembly code is:
.LC0: .string "Hello World" .text .globl main .type main, @function main: pushq %rbp movq %rsp, %rbp movl $.LC0, %eax movq %rax, %rdi movl $0, %eax call printf leave ret
This printf to puts/putchar optimization is done in the source file gcc/builtins.c in the GCC source tree. In this source file, look at the fold_builtin_printf function. The rules to determine which optimization to perform are as follows (note the order of rules it checks is important):
- Return value is used: No optimization.
- The format string is not a literal constant: No optimization.
- The format string has no % in it:
- The format string has length 0: Eliminate this printf call.
- The format string has length 1: Call putchar
- The format string is of form "string\n": Create a NULL-terminated string that's one char shorter than the original, and call puts. (This is our C code example above)
- Otherwise, no optimization.
- The format string is of form "%s\n": Call puts.
- The format string is of form "%c": Call putchar.
- Otherwise, no optimization.
There are many similar built-in function optimizations. For example, the fold_builtin_sprintf function would use strcpy instead in some cases.
The fold_builtin_cos function will (at compilation time) calculate the result when the argument is a constant (using the MPFR library. This is why MPFR is needed in compiling GCC itself), or change cos(-x) into cos(x).
The fold_builtin_tan function will calculate the result when the argument is a constant, and if -funsafe-math-optimizations compiler option is used (so the variable flag_unsafe_math_optimizations is true), change tan(atan(x)) to x directly.
Here is a list of non-math functions (from GCC source file gcc/builtins.def) which would be optimized by transformations (e.g. change printf to puts), by compile-time calculation, or by inline assembly code:
alloca, bzero, fprintf, fputs, isascii, isdigit, memcmp, memcpy, mempcpy, memset, printf, sprintf, stpcpy, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, toascii, va_copy, va_end, va_start
The complete list of built-in functions provided by GCC is here.
If one wants to use optimized version of above function without using the -O2 compiler command-line option, one can call the __builtin_-prefix version of these functions, e.g. __builtin_alloca, __builtin_bzero, ..
Actually, the header files use this approach. For example, in string.h, one can see code like this:
# ifdef __OPTIMIZE__ __extern_always_inline void * memchr (void *__s, int __c, size_t __n) __THROW { return __builtin_memchr (__s, __c, __n); } ... # endifNot only GCC is doing so, Intel C Compiler has optimization for the following functions too (see its libirc.so):
calloc, memcmp, memcpy, memmove, memset, memset, memzero, strlen
And AMD provides an optimized string library for the following functions:
ffsll (find the position of the first (least significant) bit set in the word; obviously this function exploits AMD's "Advanced Bit Manipulation" instruction "lzcnt"), index, memchr, rindex, strchr, strlen, strnlen, strrchr