LCOV with LLVM

The Linux package lcov is a set of Perl scripts to convert gcov coverage information into nice looking HTML pages wherein the project’s coverage metrics are concisely visible. Fortunately, the Clang compiler is also capable to generate GCOV-compatible data such that lcov may be used with the LLVM tool chain. To get LLVM working together with lcov, the following steps have to be performed:

  1. Get the latest version of lcov, at least 1.12 from the project’s web page
  2. Follow the instructions here

Macro Definitions

When trying to analyze a compiler and/or build error involving macro definitions, e.g. the nasty feature test macros, it might be useful to find out where a macro has been defined and which value it takes. A straightforward approach could be to look at the compiler output after pre-processing, typically enforced by adding the -E option. Unfortunately, this output only shows the final pre-processed output where all the macros have already been expanded. Fortunately, there is an additional option in clang and gcc that prints both, the expanded macros as well as the location of the definition. To exemplify this, we take the following source program:

To finally get the pre-processed output containing the macro definitions, we have to add the -E and -dD arguments to the compile command. Since -E prints the pre-processed file to stdout, we have to redirect the output into a file, typically with the file extension .i:

The pre-processed file is then the following. Please note that the file also includes builtin macros, e.g. like __x86_64__, which are however truncated in the below output for clarity:

C++ Factory Method with Shared Libraries

Suppose that you are working on a generic framework and you would like to allow the users of the framework to extend the framework with domain-specific functionality. A programming pattern that is typically deployed for this scenario is the so-called factory method pattern. The core components of this pattern are an (abstract) interface class, a set of derived classes and a factory method that generates the requested type cast to the interface class. A very simplistic coding of the factory method could be the following:

If you are working on a very small, possibly company-internal, framework, it might be acceptable to share your complete code base with all programmers and allow them to modify the above code when adding new type classes, e.g. ClassC, ClassD, etc. In turn, if your code base is fairly huge, e.g. with overall build-times larger than 30mins, or if you do not want to share the code base, e.g. due to IP restrictions, you might want users of your framework to provide the functionality for new types in terms of shared libraries built out-of-source and loaded at runtime when requested. This post sketches how this could be implemented for the prototypical use-case above in C++.

The first ingredient for our recipe is the interface definition which is the Interface class plus an evil registration macro to streamline and unify the plugin handling:

As we will see in the later course of this post, every shared library is required to have a unique and identical entry point to be loadable by our framework. This entry point is created by the macro PLUGIN_CLASS. Note the use of extern “C” that disables the C++ name mangling such that the function is exported as createPlugin. Every plugin is required to use the macro PLUGIN_CLASS once! Given the interface, we next define our shared library for ClassA:

Next, we compile the above code as shared library and inspect the exported symbols of the shared library with the nm(1) utility to check that our entry point is available:

Note that the capital T – for text – in the output of nm(1) indicates a defined and exported function. Having defined the prerequisites for our framework, we now take a look at the new factory method:

The main function is the same as before with the sole exception that our main program requires the file name of a shared library to be given as first program argument. The new factory method factoryMethod requires the file name of the shared library to be given, but still returns a unique pointer to the Interface class. Inside the factory method, the central parts are the calls to the functions dlopen(3) and dlsym(3) that allow to load a shared library at runtime and to query for a symbol’s address inside this library, respectively. But let’s look at the code line by line. In order to map the function’s address to C++ function pointers, i.e. std::function, we define the function signature (line 6) and the respective std::function type (line 7). The shared library is then opened in line 10 by dlopen(3) which returns an opaque library handle. The first parameter to dlopen(3) is the path or file name of the shared library, while the second parameters configures when unresolved symbols inside the shared library are tried to resolve. The two possible settings are RTLD_LAZY to perform the resolution only when the symbol is referenced or RTLD_NOW to perform the resolution immediately at load time. We choose the former one for performance reasons. In line 14 we finally query the shared library, identified by the opaque library handle, for the address of the function createPlugin which is cast to the std::function in line 16 and eventually called in line 17.

Compiling and running the code then yields the desired results:

Library Call Interception

In the context of system analysis and system trouble-shooting the tracing and interception of individual function calls, e.g. system calls, from user-space processes might be required or at least useful. When your system is running an up-to-date version of Linux, probing could be applied by using SystemTap or on a more specialized scale malloc hooks for functions of the malloc family. This post shows a Unix-generic solution to this problem relying on symbol overloading and pre-loading of shared libraries at runtime. While this approach is not tailored to Linux, the examples however are compiled and executed on an Ubuntu 14.04 system. The examples are known to be applicable to AIX, HPUX and Solaris.

To give you a specific use case wherein the below approach could be applicable had been the analysis of memory leaks in a program with a high amount of small memory allocations that in total however summed up to a high magnitude of gigabytes. Due to the high pressure on the dynamic memory allocator, approaches like compiler instrumentation (modern tools like the LLVM/Clang Leak Sanitizer had not been available) or in-depth program and heap analysis, e.g. by Valgrind, were not applicable due to speed issues. Our solution to address the problem was then to perform high-speed tracing of malloc(3), calloc(3), realloc(3) and free(3) functions and postpone the leak analysis to an off-line process running on the recorded runtime data.

As a first demonstration we try to intercept function calls to malloc(3) and free(3) and compile the two functions into a shared library called libintercept.so. To start we first lookup the function declarations of malloc(3) and free(3), for example by checking their respective man pages. The signature of both functions is:

Before we continue with the final coding of both functions, the most urgent question is how it is indeed possible to get our interception functions called instead of the real malloc(3) and free(3) functions in libc. Obviously, the whole code only works if our application is linked to libc (or whatever library we try to intercept) dynamically. If the program is linked statically, we cannot intercept the functions, though. Trying to keep the details of symbol resolution in dynamically linked applications at a minimum – please check this excellent post series for all the glory details – the dynamic loader decides at runtime which function to call by checking and matching the function symbols of all loaded shared libraries. If the same function symbol is exported by multiple loaded shared libraries, the order matters such that the first exported symbol is preferred. And this is exactly how we intercept the function calls by telling the dynamic loader to load our interception library first before the real library, in this case libc, is loaded.

Let’s get back to the code. This is how our interception functions are finally implemented:

The feature test macro _GNU_SOURCE is required to use the macro RTLD_NEXT. Next the header includes are defined whereof the header dlfcn.h is the one needed to interact with the dynamic loader as will be described later. The following lines of code define our version of the malloc function having the exact same signature as its original version. Inside the malloc function, we then firstly create a static function pointer for a function having malloc’s signature. The reason for using a static variable is that we do not want to query the address of the real malloc function for every function call to malloc so that this address is cached once “globally”. For the first entry into our malloc function we however have to ask the dynamic loader to find this address. This is accomplished by calling dlsym(3) with the special argument RTLD_NEXT. The functions from the dlfcn.h header file are in general used in Unix operating systems to load dynamic libraries and introspect the loaded dynamic libraries at runtime e.g. as opposed to at the application start-up time. The dlsym(3) may then be used in this context to find a symbol’s address in such a dynamic library. Using the special argument RTLD_NEXT we however instruct the dynamic loader to find the address of the next symbol in the search hierarchy having the name malloc – which is supposed to be our original version. While the first argument to dlsym(3) typically is a pointer to a loaded shared library, the special argument RTLD_NEXT refers to all loaded shared library, i.e. at application start-up and loaded by dlopen(3). Note that we would construct an endless-loop if RTLD_NEXT would not be used!

Once we found the address of the original malloc function, we perform our tracing and finally call the original function’s address retrieved before. That’s it! The code for free(3) is written likewise. A word of caution here about the tracing function: while functions from printf(3)-familiy may generally be used for tracing, they are critical for malloc(3). The reason is that printf(3), when used with format specifiers, internally calls malloc(3) so that you might end-up with an endless recursion crashing your stack.

Finally, how do we use the interception library to trace our application? As a highly simplified program we use the following main program to test our library:

Let’s compile and run the code:

If we execute the code, we do not get any output on the console, because the shared library libintercept.so is not loaded. This could be verified by running the ldd(1) utility to display dependent shared libraries:

To get our interception library loaded, we need to set the environment variable LD_PRELOAD to point to our library. In general, the environment variable LD_PRELOAD takes a colon- or space-separated list of libraries to be loaded before the dependent libraries are loaded. This could be verified by again using the ldd(1) utility:

To combine all of the above, we enable the tracing library for our application by the following invocation:

Variadic Functions

Almost all of the system calls and libc library functions have a fixed function signature with a pre-defined number of parameters. Exceptions are the functions from the printf(3)-family and the open(2) system call among others. For the printf(3) functions, the interception is easy because the libc library provides functions to pass in va_list(3)s, while for open(2) the specification is clear when the variadic parameter is used. For generic variadic functions, it is unfortunately not possible to intercept them unless the library provides a function taking a va_list as input argument.

LLVM Out-of-Source Pass

The LLVM software project provides an elegant feature to build plugins/extension out of the full-blown source tree. Two of the benefits of this feature are 1) that build times are reduced and 2) a self-produced LLVM build is not required to just implement some small extension. The documentation of this feature is available here. I recently tried the out-of-source build by using the Hello transform pass (/lib/Transforms/Hello) and performing the following steps:

When trying to build the pass for LLVM 3.7.1, I unfortunately encountered the following error message, while running CMake:

This problem is already discussed on the web, e.g. StackOverflow, and the root cause is that the CMake variable LLVM_ENABLE_PLUGINS is not set. To get this variable defined, an additional CMake file, HandleLLVMOptions has to be included. By the way, please note that this file is also helpful if LLVM headers are used in the code, because otherwise -std=c++11 is not appended to the C++ compiler flags automatically. Otherwise, the -std=c++11 flag has to be added manually. Including HandleLLVMOptions next yields this error message:

The StackOverflow link above offers a solution for this new problem by defining two CMake variables, which is an acceptable work-around for LLVM 3.7.1. However, the good news is that this problem is fixed in the latest LLVM release 3.8.0 (released on March 8, 2016). Given the fixed code, only two CMake files have to be included, i.e. HandleLLVMOptions and AddLLVM. Note: the variable LLVM_ENABLE_PLUGINS is meanwhile set in the file LLVMConfig.cmake so that for 3.8.0 only AddLLVM is strictly required, if -std=c++11 is added manually or C++11 is not needed in the LLVM pass.

LLVM Setup

This short post is about small Python script that facilitates the setup of LLVM software builds. The documentation about how-to build LLVM by yourself is great and detailed in e.g. LLVM Getting Started, however there is one problem that I am usually confronted with once I want to build LLVM including all its components like clang, compiler-rt, libcxx, etc.: what is the exact download path for each component (either compressed archives or SVN/GIT) and more importantly which directory inside the LLVM source tree do I have to put the components’ files into?

I wrote a small and simple Python script, available on Github that takes care for you to setup the LLVM source tree containing the LLVM components that you would like to build. For usage details, please checkout the Github link, here is just a sample command sequence on how-to use the script: