How GCC generates optimized code for printf (and GCC built-in functions)

(Most of the material here is from here)

Consider the following simple C code:

 #include <stdio.h>

 void main() {
   printf("Hello World\n");
 }
When compiled with GCC 4.5 -O0 option (i.e. turn off optimizations), the generated assembly code is:
.LC0:
    .string "Hello World"
    .text
.globl main
    .type   main, @function
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %edi
    call    puts
    leave
    ret
If one compiles the C code again with -fno-builtin option, then the generated assembly code is:
.LC0:
    .string "Hello World\n"
    .text
.globl main
    .type   main, @function
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %eax
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf
    leave
    ret
If one removes the \n in the original C code and compile with -O0 option, then the generated assembly code is:
.LC0:
    .string "Hello World"
    .text
.globl main
    .type   main, @function
main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %eax
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf
    leave
    ret

This printf to puts/putchar optimization is done in the source file gcc/builtins.c in the GCC source tree. In this source file, look at the fold_builtin_printf function. The rules to determine which optimization to perform are as follows (note the order of rules it checks is important):

  1. Return value is used: No optimization.
  2. The format string is not a literal constant: No optimization.
  3. The format string has no % in it:
    1. The format string has length 0: Eliminate this printf call.
    2. The format string has length 1: Call putchar
    3. The format string is of form "string\n": Create a NULL-terminated string that's one char shorter than the original, and call puts. (This is our C code example above)
    4. Otherwise, no optimization.
  4. The format string is of form "%s\n": Call puts.
  5. The format string is of form "%c": Call putchar.
  6. Otherwise, no optimization.

There are many similar built-in function optimizations. For example, the fold_builtin_sprintf function would use strcpy instead in some cases.

The fold_builtin_cos function will (at compilation time) calculate the result when the argument is a constant (using the MPFR library. This is why MPFR is needed in compiling GCC itself), or change cos(-x) into cos(x).

The fold_builtin_tan function will calculate the result when the argument is a constant, and if -funsafe-math-optimizations compiler option is used (so the variable flag_unsafe_math_optimizations is true), change tan(atan(x)) to x directly.

Here is a list of non-math functions (from GCC source file gcc/builtins.def) which would be optimized by transformations (e.g. change printf to puts), by compile-time calculation, or by inline assembly code:

alloca, bzero, fprintf, fputs, isascii, isdigit, memcmp, memcpy, mempcpy, memset, printf, sprintf, stpcpy, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncat, strncmp, strncpy, strpbrk, strrchr, strspn, strstr, toascii, va_copy, va_end, va_start

The complete list of built-in functions provided by GCC is here.

If one wants to use optimized version of above function without using the -O2 compiler command-line option, one can call the __builtin_-prefix version of these functions, e.g. __builtin_alloca, __builtin_bzero, ..

Actually, the header files use this approach. For example, in string.h, one can see code like this:

# ifdef __OPTIMIZE__
__extern_always_inline void *
memchr (void *__s, int __c, size_t __n) __THROW
{
  return __builtin_memchr (__s, __c, __n);
}
...
# endif
Not only GCC is doing so, Intel C Compiler has optimization for the following functions too (see its libirc.so):

calloc, memcmp, memcpy, memmove, memset, memset, memzero, strlen

And AMD provides an optimized string library for the following functions:

ffsll (find the position of the first (least significant) bit set in the word; obviously this function exploits AMD's "Advanced Bit Manipulation" instruction "lzcnt"), index, memchr, rindex, strchr, strlen, strnlen, strrchr