clu2's notes: Notes on Valgrind

Official Valgrind website

Valgrind Command-line Options

--tool=name	Use the Valgrind tool named name. The default name is memcheck
-v	Verbose mode
-d	Show debug info
--trace-children=no\|yes	Valgrind-ise child processes (follow execve)
--track-fds=no\|yes	Trace open file descriptors
--trace-malloc=no\|yes	Trace client malloc
--time-stamp=no\|yes	Add timestamps to log messages
--log-fd=number	Log messages to file descriptor [2=stderr]
--log-file=file	Log messages to file named file
--log-socket=ipaddr:port	Log messages to socket named ipaddr:port
--demangle=no\|yes	Automatically demangle C++ names (for certain tools only)
--read-var-info=yes\|no	Use debug info on stack and global variables for better error message (for certain tools only)
--trace-flags=XXXXXXXX	Show trace after which phase. XXXXXXXX can be 10000000: Show after the 1st phase 01000000: Show after the 2nd phase ...
--trace-notbelow=XXXXXXXX	Do not show trace after which phase.
--profile-flags=XXXXXXXX	Similar to --trace-flags, but show generated IR code after which phase.
--debug-dump=level	Dump debug info. level can be syms, line, frames. Must be used with -d option.
--trace-syscalls=no\|yes	Show system call details (like strace)
--trace-signals=no\|yes	Show signal handling details
--trace-symtab=no\|yes	Show symbol table details
--trace-cfi=no\|yes	Show call-frame-info details
--trace-redir=no\|yes	Show redirection details
--trace-sched=no\|yes	Show thread scheduler details
--wait-for-gdb=yes\|no	Pause on startup to wait for gdb attach
--sym-offsets=yes\|no	Show symbols in form 'name+offset'
--vex-iropt-verbosity=n	Show IR Optimization details. n ranges from 0 to 9
--vex-iropt-level=n	Control IR Optimization level. n ranges from 0 to 2. n=0 means no optimization
--vex-iropt-precise-memory-exns=no\|yes	Precise exceptions hadling required.
--vex-iropt-unroll-thresh=n	Unroll the loops with maximum of loop count n. Default is 120.

Valgrind Core

(Disclaimer: Much of the following text is taken from the papers Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation by Nicholas Nethercote and Julian Seward, and Optimizing Binary Code Produced by Valgrind by Filipe Cabecinhas, Nuno Lopes, and Renato Crisostomo)

Valgrind's core is split in two: CoreGrind and VEX. VEX is responsible for dynamic code translation and for calling tools' hooks for IR (Intermediate Representation) instrumentation, while CoreGrind is responsible for the rest (dispatching, scheduling, block cache management, symbol name demangling, etc..)

Code translation is done by VEX and it is done in the following phases

Code Disassembly
Conversion of the machine code to VEX's machine-independent IR. The IR is based on single-static-assignment form and has some RISC-like features. Most machine instructions get disassembled to more than one IR opcodes.
For x86_64, this phase's source files are
- guest_amd64_toIR.c: The main functions involved are
  - disInstr_AMD64, which calls disInstr_AMD64_WRK to do the majority of disassembly.
  - disInstr_AMD64_WRK: It will first check if the code block starts with the magic special instruction preamble (see __SPECIAL_INSTRUCTION_PREAMBLE below). If not it starts disassembling the machine code.
  - disFPU and disMMX: These two functions are called by disInstr_AMD64_WRK to handle the non-SSE related FPU/MMX instructions. SSE instructions are decoded in disInstr_AMD64_WRK.
- libvex_ir.h: This file provides definitions and abundant explanation of VEX's machine independent IR.
  In general, the code is broken into small code blocks ("superblocks", type: IRSB). Each code block typically represents from 1 to perhaps 50 instructions. IRSBs are single-entry, multiple-exit code blocks. Each IRSB contains three things: A type environment, which indicates the type of each temporary value present in the IRSB (e.g. Ity_I1, Ity_I8, Ity_I16, .., Ity_F32, Ity_F64, Ity_V128), a list of statements (which represent code, type: IRStmt), and a jump that exits from the end the IRSB.
IR Optimization
Some standard compiler optimizations are applied to the IR, including dead code elimination, constant folding, common subexpression elimination, etc.
The main source file is ir_opt.c, and the main entry function is do_iropt_BB. The level of optimization can be controlled by Valgrind's command-line option --vex-iropt-level=n, with n=0 being no optimization at all.
Instrumentation
VEX calls the Valgrind tool's hooks to instrument the code.
IR Optimization
Similar to the previous optimization pass, albeit a little simpler, i.e. only dead code elimination.
This phase's code is inside LibVEX_Translate function: Search for the comment Do a post-instrumentation cleanup pass.
Tree Building
Transform the flat IR to tree IR, to simplify the next phase.
The main source file is ir_opt.c, and the main entry function is ado_treebuild_BB.
Instruction Selection
Conversion of the IR to machine code. This phase still uses virtual registers.
For x86_64, this phase's source file is host_amd64_isel.c. The main function involved is iselStmt. Each statement involves some expressions, which involves some operators (enum type IROp, defined in libvex_ir.h). For example, if an expression is Iop_Add32Fx4 (which comes from the disassembly of the SSE instruction ADDPS in guest_amd64_toIR.c), then in host_amd64_isel.c it will generate machine-specific instruction AMD64Instr_Sse32Fx4 with operator equal to Asse_ADDF (which is part of the enum type AMD64SseOp defined in host_amd64_defs.h). This instruction is also tagged with Ain_Sse32Fx4 to indicate it is a vectorized instruction (summing two vector of 4 single-precision floating-point numbers.)
Register Allocation
Allocates real host registers to virtual registers, using a linear scan algorithm. This phase can create additional instructions for register spills and reloads (especially in register-constrained architectures like x86).
The main source file is host_generic_reg_alloc2.c, and the main entry function is doRegisterAllocation.
Code Generation
Generates the final machine code, by simply encoding the previously generated instructions and storing them to a memory block.
For x86_64, this phase's source file is host_amd64_defs.c. The main function involved is emit_AMD64Instr. It uses the information passed from phase 6, e.g. Asse_ADDF and Ain_Sse32Fx4, to generate the machine code of ADDPS

Client & Valgrind

Valgrind has a trapdoor mechanism via which the client program can pass all manner of requests and queries to Valgrind and the current tool. For example, by examining the value of RUNNING_ON_VALGRIND (this is a macro defined in valgrind.h), the client program can tell if it is running on Valgrind or on a real CPU.

(See here for other trapdoors/client requests)

This mechanism is implemented in Valgrind as follows (See Memory Debugging of MPI-Parallel Applications in Open MPI by Rainer Keller, Shiqing Fan, and Michael Resch) The client program calls VALGRIND_DO_CLIENT_REQUEST, which contains a special platform-dependent instruction preamble (__SPECIAL_INSTRUCTION_PREAMBLE in valgrind.h). Valgrind can detect it and steer the instrumentation. This preamble is usually a series of rotations which will not change the original value after the rotations. On x86_64, this preamble is

   rolq $3,  %edi
   rolq $13, %edi
   rolq $61, %edi
   rolq $51, %edi

which is just rotating left the 64-bit register %rdi by 3,13,61, and 51 (a total of 128).

Function Wrapping

Valgrind allows calls to some specified functions to be intercepted and rerouted to a different, user-supplied function. For details, see here.

Valgrind source code annoyances

CoreGrind and many of Valgrind tools tend to use VG_ and ML_ symbol prefixings in function or variable naming. This prevents source code editor/browser from recognizing them. The VG_ and ML_ macros are defined in pub_tool_basics.h. One can also verify this by

 nm -n coregrind/libcoregrind-amd64-linux.a

and will see that VG_(str) expands to vgPlain_str and ML_(str) to vgModuleLocal_str.

To fix this, in UltraEdit, open the Replace dialog, enable Unix-style Regular Expressions, and replace

VG_\(([a-zA-Z0-9_]+)\)

with

vgPlain_\1

, and replace

ML_\(([a-zA-Z0-9_]+)\)

with

vgModuleLocal_\1

Adding new tools or new source files to Valgrind

See this link.

The required autogen.sh can be downloaded here.

Change the default tool from memcheck to your own tool

Modify coregrind/launcher-linux.c. In main(), change the following

   if (toolname) {
      vgPlain_debugLog(1, "launcher", "tool '%s' requested\n", toolname);
   } else {
      vgPlain_debugLog(1, "launcher",
                          "no tool requested, defaulting to 'memcheck'\n");
      toolname = "memcheck";
   }