All posts by Andi Hellmund


The Linux package lcov is a set of Perl scripts to convert gcov coverage information into nice looking HTML pages wherein the project’s coverage metrics are concisely visible. Fortunately, the Clang compiler is also capable to generate GCOV-compatible data such that lcov may be used with the LLVM tool chain. To get LLVM working together with lcov, the following steps have to be performed:

  1. Get the latest version of lcov, at least 1.12 from the project’s web page
  2. Follow the instructions here

Macro Definitions

When trying to analyze a compiler and/or build error involving macro definitions, e.g. the nasty feature test macros, it might be useful to find out where a macro has been defined and which value it takes. A straightforward approach could be to look at the compiler output after pre-processing, typically enforced by adding the -E option. Unfortunately, this output only shows the final pre-processed output where all the macros have already been expanded. Fortunately, there is an additional option in clang and gcc that prints both, the expanded macros as well as the location of the definition. To exemplify this, we take the following source program:

To finally get the pre-processed output containing the macro definitions, we have to add the -E and -dD arguments to the compile command. Since -E prints the pre-processed file to stdout, we have to redirect the output into a file, typically with the file extension .i:

The pre-processed file is then the following. Please note that the file also includes builtin macros, e.g. like __x86_64__, which are however truncated in the below output for clarity:

C++ Factory Method with Shared Libraries

Suppose that you are working on a generic framework and you would like to allow the users of the framework to extend the framework with domain-specific functionality. A programming pattern that is typically deployed for this scenario is the so-called factory method pattern. The core components of this pattern are an (abstract) interface class, a set of derived classes and a factory method that generates the requested type cast to the interface class. A very simplistic coding of the factory method could be the following:

If you are working on a very small, possibly company-internal, framework, it might be acceptable to share your complete code base with all programmers and allow them to modify the above code when adding new type classes, e.g. ClassC, ClassD, etc. In turn, if your code base is fairly huge, e.g. with overall build-times larger than 30mins, or if you do not want to share the code base, e.g. due to IP restrictions, you might want users of your framework to provide the functionality for new types in terms of shared libraries built out-of-source and loaded at runtime when requested. This post sketches how this could be implemented for the prototypical use-case above in C++.

The first ingredient for our recipe is the interface definition which is the Interface class plus an evil registration macro to streamline and unify the plugin handling:

As we will see in the later course of this post, every shared library is required to have a unique and identical entry point to be loadable by our framework. This entry point is created by the macro PLUGIN_CLASS. Note the use of extern “C” that disables the C++ name mangling such that the function is exported as createPlugin. Every plugin is required to use the macro PLUGIN_CLASS once! Given the interface, we next define our shared library for ClassA:

Next, we compile the above code as shared library and inspect the exported symbols of the shared library with the nm(1) utility to check that our entry point is available:

Note that the capital T – for text – in the output of nm(1) indicates a defined and exported function. Having defined the prerequisites for our framework, we now take a look at the new factory method:

The main function is the same as before with the sole exception that our main program requires the file name of a shared library to be given as first program argument. The new factory method factoryMethod requires the file name of the shared library to be given, but still returns a unique pointer to the Interface class. Inside the factory method, the central parts are the calls to the functions dlopen(3) and dlsym(3) that allow to load a shared library at runtime and to query for a symbol’s address inside this library, respectively. But let’s look at the code line by line. In order to map the function’s address to C++ function pointers, i.e. std::function, we define the function signature (line 6) and the respective std::function type (line 7). The shared library is then opened in line 10 by dlopen(3) which returns an opaque library handle. The first parameter to dlopen(3) is the path or file name of the shared library, while the second parameters configures when unresolved symbols inside the shared library are tried to resolve. The two possible settings are RTLD_LAZY to perform the resolution only when the symbol is referenced or RTLD_NOW to perform the resolution immediately at load time. We choose the former one for performance reasons. In line 14 we finally query the shared library, identified by the opaque library handle, for the address of the function createPlugin which is cast to the std::function in line 16 and eventually called in line 17.

Compiling and running the code then yields the desired results:

Library Call Interception

In the context of system analysis and system trouble-shooting the tracing and interception of individual function calls, e.g. system calls, from user-space processes might be required or at least useful. When your system is running an up-to-date version of Linux, probing could be applied by using SystemTap or on a more specialized scale malloc hooks for functions of the malloc family. This post shows a Unix-generic solution to this problem relying on symbol overloading and pre-loading of shared libraries at runtime. While this approach is not tailored to Linux, the examples however are compiled and executed on an Ubuntu 14.04 system. The examples are known to be applicable to AIX, HPUX and Solaris.

To give you a specific use case wherein the below approach could be applicable had been the analysis of memory leaks in a program with a high amount of small memory allocations that in total however summed up to a high magnitude of gigabytes. Due to the high pressure on the dynamic memory allocator, approaches like compiler instrumentation (modern tools like the LLVM/Clang Leak Sanitizer had not been available) or in-depth program and heap analysis, e.g. by Valgrind, were not applicable due to speed issues. Our solution to address the problem was then to perform high-speed tracing of malloc(3), calloc(3), realloc(3) and free(3) functions and postpone the leak analysis to an off-line process running on the recorded runtime data.

As a first demonstration we try to intercept function calls to malloc(3) and free(3) and compile the two functions into a shared library called To start we first lookup the function declarations of malloc(3) and free(3), for example by checking their respective man pages. The signature of both functions is:

Before we continue with the final coding of both functions, the most urgent question is how it is indeed possible to get our interception functions called instead of the real malloc(3) and free(3) functions in libc. Obviously, the whole code only works if our application is linked to libc (or whatever library we try to intercept) dynamically. If the program is linked statically, we cannot intercept the functions, though. Trying to keep the details of symbol resolution in dynamically linked applications at a minimum – please check this excellent post series for all the glory details – the dynamic loader decides at runtime which function to call by checking and matching the function symbols of all loaded shared libraries. If the same function symbol is exported by multiple loaded shared libraries, the order matters such that the first exported symbol is preferred. And this is exactly how we intercept the function calls by telling the dynamic loader to load our interception library first before the real library, in this case libc, is loaded.

Let’s get back to the code. This is how our interception functions are finally implemented:

The feature test macro _GNU_SOURCE is required to use the macro RTLD_NEXT. Next the header includes are defined whereof the header dlfcn.h is the one needed to interact with the dynamic loader as will be described later. The following lines of code define our version of the malloc function having the exact same signature as its original version. Inside the malloc function, we then firstly create a static function pointer for a function having malloc’s signature. The reason for using a static variable is that we do not want to query the address of the real malloc function for every function call to malloc so that this address is cached once “globally”. For the first entry into our malloc function we however have to ask the dynamic loader to find this address. This is accomplished by calling dlsym(3) with the special argument RTLD_NEXT. The functions from the dlfcn.h header file are in general used in Unix operating systems to load dynamic libraries and introspect the loaded dynamic libraries at runtime e.g. as opposed to at the application start-up time. The dlsym(3) may then be used in this context to find a symbol’s address in such a dynamic library. Using the special argument RTLD_NEXT we however instruct the dynamic loader to find the address of the next symbol in the search hierarchy having the name malloc – which is supposed to be our original version. While the first argument to dlsym(3) typically is a pointer to a loaded shared library, the special argument RTLD_NEXT refers to all loaded shared library, i.e. at application start-up and loaded by dlopen(3). Note that we would construct an endless-loop if RTLD_NEXT would not be used!

Once we found the address of the original malloc function, we perform our tracing and finally call the original function’s address retrieved before. That’s it! The code for free(3) is written likewise. A word of caution here about the tracing function: while functions from printf(3)-familiy may generally be used for tracing, they are critical for malloc(3). The reason is that printf(3), when used with format specifiers, internally calls malloc(3) so that you might end-up with an endless recursion crashing your stack.

Finally, how do we use the interception library to trace our application? As a highly simplified program we use the following main program to test our library:

Let’s compile and run the code:

If we execute the code, we do not get any output on the console, because the shared library is not loaded. This could be verified by running the ldd(1) utility to display dependent shared libraries:

To get our interception library loaded, we need to set the environment variable LD_PRELOAD to point to our library. In general, the environment variable LD_PRELOAD takes a colon- or space-separated list of libraries to be loaded before the dependent libraries are loaded. This could be verified by again using the ldd(1) utility:

To combine all of the above, we enable the tracing library for our application by the following invocation:

Variadic Functions

Almost all of the system calls and libc library functions have a fixed function signature with a pre-defined number of parameters. Exceptions are the functions from the printf(3)-family and the open(2) system call among others. For the printf(3) functions, the interception is easy because the libc library provides functions to pass in va_list(3)s, while for open(2) the specification is clear when the variadic parameter is used. For generic variadic functions, it is unfortunately not possible to intercept them unless the library provides a function taking a va_list as input argument.

LLVM Out-of-Source Pass

The LLVM software project provides an elegant feature to build plugins/extension out of the full-blown source tree. Two of the benefits of this feature are 1) that build times are reduced and 2) a self-produced LLVM build is not required to just implement some small extension. The documentation of this feature is available here. I recently tried the out-of-source build by using the Hello transform pass (/lib/Transforms/Hello) and performing the following steps:

When trying to build the pass for LLVM 3.7.1, I unfortunately encountered the following error message, while running CMake:

This problem is already discussed on the web, e.g. StackOverflow, and the root cause is that the CMake variable LLVM_ENABLE_PLUGINS is not set. To get this variable defined, an additional CMake file, HandleLLVMOptions has to be included. By the way, please note that this file is also helpful if LLVM headers are used in the code, because otherwise -std=c++11 is not appended to the C++ compiler flags automatically. Otherwise, the -std=c++11 flag has to be added manually. Including HandleLLVMOptions next yields this error message:

The StackOverflow link above offers a solution for this new problem by defining two CMake variables, which is an acceptable work-around for LLVM 3.7.1. However, the good news is that this problem is fixed in the latest LLVM release 3.8.0 (released on March 8, 2016). Given the fixed code, only two CMake files have to be included, i.e. HandleLLVMOptions and AddLLVM. Note: the variable LLVM_ENABLE_PLUGINS is meanwhile set in the file LLVMConfig.cmake so that for 3.8.0 only AddLLVM is strictly required, if -std=c++11 is added manually or C++11 is not needed in the LLVM pass.

LLVM Setup

This short post is about small Python script that facilitates the setup of LLVM software builds. The documentation about how-to build LLVM by yourself is great and detailed in e.g. LLVM Getting Started, however there is one problem that I am usually confronted with once I want to build LLVM including all its components like clang, compiler-rt, libcxx, etc.: what is the exact download path for each component (either compressed archives or SVN/GIT) and more importantly which directory inside the LLVM source tree do I have to put the components’ files into?

I wrote a small and simple Python script, available on Github that takes care for you to setup the LLVM source tree containing the LLVM components that you would like to build. For usage details, please checkout the Github link, here is just a sample command sequence on how-to use the script:

GCC front-end whitepaper

In the last couple of months, I created a white paper about GCC front-end internals. This white-paper is not yet complete and there are many many other areas of the compiler which are worth describing.

So, if you do have any valuable feedback for the white paper or if you do have areas which you wish to get some internal documentation about, please just let me know and I will think about adding more sections.

Just drop me a comment or an eMail ( …

The white paper could be downloaded here!

Simplistic GNU makefile for lazy programmers

A friend of mine and I, we were recently discussing about the layout of a GNU makefile that allows you to add source files to your source tree in an arbitrary directory hierarchy without having to modify the makefile. As other requirements, we only wanted to create a single top-level makefile in the build directory, though no cascading sub-directory makefiles, and the makefile should be able to handle multiple source files with the same name.

Beside the usual make logic, GNU make provides the makefile writer with a good set of helper functions, e.g. for text replacement or directory/file operations. For a full reference of GNU make functions, please check out the official documentation.

As mentioned in the title, this makefile is very simplistic. It makes the following assumption about the software project. It is for now basically oriented towards software development in C, but it should be easy possible to enhance it for other languages.

  • all the source files are located in a single directory, e.g. src
  • there is a separate build directory containing the created object files, e.g. build (this is not a hard requirement, but I like to separate the sources from the object and executable files
  • all source files having the same file type will be built with the same compiler options

So, here it is:

So, what is this makefile doing?

The first line collects all the source files, in this case all C source files, from the source directory. The $(shell …) command will execute any shell command from within the makefile. These specific shell commands will only record the file path relative to the source directory. I will explain below why this is useful. The second line is a text replacement to determine the object files corresponding the source files.

The next interesting line is the vpath command. The vpath command allows the makefile writer to specify directories where make should look for dependencies in addition to the local directory. It furthermore allows you to specify alternative directories based on the file extension. Here, we say that make should also look for .c files in the source directory.

Finally, the most important part of the makefile is the generic rule to translate .c files into .o files (%.o: %.c). This rule is used for each object file requested as dependency for the final executable (bin). For example, assume that the final executable only depends on the object file generic/a/bin.o, then GNU make will internally handle the generic rule as:

The next important aspect here is that we use the special make variables $< (refering to the first dependency) and $@ (refering to the target). This rule is also where the vpath instruction from the beginning of the makefile comes into play. When searching for the file generic/a/bin.c as dependency, GNU make will first try to find this file relative to the current directory, but since it does not exist relative to the local directory, it will then search relative to the vpath directory, though the source directory. There, GNU make will find the file and use it. As a short side note about this rule: the $(dir …) make function extracts the directory part of a file name and the dash (-) in front of the rule tells make to ignore the return value of the mkdir command. This is a somewhat nasty hack, but simplifies the makefile by avoiding checks for directory existence.

As said, this makefile is very simplistic and might not match all requirements. However, it at least shows some interesting aspects and functions of GNU make.

GCC front-end (last): official GCC internal documentation

This post is part of a series about GCC internals and specifically about how-to create a new language front-end for GCC. For a list of related posts, please check this page.

As part of his Google Summer of Code project, one of the future GCC contributors (redbrain) decided to extend the currently available internal GCC documentation by a detailed tutorial about GCC front-ends. I finally decided that the GCC internals document is the right place for such a tutorial so that I’ll contribute my findings and a really supported front-end skeleton including IR generation.

For this reason, I’ll stop this still very young session about GCC front-ends. Instead, I’m hopefully going to talk a bit about the newly available feature of GCC 4.5.0 named link-time-optimization (LTO). As I’m currently working on a dumping tool for LTO intermediate files (more about that later), I’ll give some insights into the details of LTO’s implementation …

GCC front-end (3): makefile

This post is part of a series about GCC internals and specifically about how-to create a new language front-end for GCC. For a list of related posts, please check this page.

As described in the last post, each language front-end has its own makefile or makefile fragment, named (located in the front-end directory), which gets called by the main makefiles in the toplevel-build directory (build-x.y.z) and the gcc sub-directory (build-x.y.z/gcc). This post will only go through the major make targets and assumes a fundamental understanding of the make utility. As a reference for rest of this post, please check out the file from the GCC front-end skeleton here.

As an entry point to the GCC makefile hierarchy, let’s consider which targets are called when building GCC and specifically a GCC front-end. Assuming a bootstrap build with the C++ front-end as an example, whenever you do a ‘make’ (which is basically ‘make all’) or a ‘make bootsrap’, the following targets get called. The targets which must be available in the language makefile are marked bold.

[Makefile in the toplevel build directory]
   -> stage3-bubble
     -> all-stage3
       -> all-stage3-gcc

The all-stage3-gcc target changes into the gcc directory and calls make all:

[Makefile in gcc subdirectory]
   -> all.internal
     -> native
       -> c++
     -> start.encap
       -> lang.start.encap
         -> c++.start.encap
     -> rest.encap

These three language targets are the main targets for building the compilation driver and the compiler. Other targets like for building the documentation are not discussed here, while the installation target will be discussed in the final section of this post. The following rules & common practices apply to these three targets:

  • c++ (or <lang>): This target is usually used to build the core compiler, e.g. cc1plus
  • c++.start.encap (or <lang>.start.encap): This target allows to include all those parts which don’t rely on a working gcc-driver version. Working gcc-driver version in this context just means a gcc-driver created by this build, because the gcc-driver (usually called xgcc before the installation) is also built by this target. Though, this target is usually used to build the compilation drivers, e.g. like g++
  • (or <lang>.rest.encap): This target finally allows to include all those parts which rely on a working gcc-driver version, so if your front-end requires any parts to be built by the newly created gcc (not the host gcc generally used for the build), put those targets here. I checked several GCC front-ends and none of these use this target.

Next, I’ll go through a sample make command to explain and show how to include dependent libraries or how to get the GCC backend integrated into your compiler:

So, the GCC infrastructure provides a lot of variables to simplify the dependency notation and the build commands. Because there are so many variables defined by the infrastructure, I won’t list them here, except the variable BACKEND. The BACKEND variable lists all the object files provided by the GCC infrastructure to connect your front-end to the middle-end and back-end of GCC.
For a reference of the mainly used variables, please check the makefile of the front-end skeleton and the makefiles of GCC-integrated front-ends (e.g. c++, java, fortran, etc.). If you would like to know which variable has what specific value, I could just recommend to grep the makefile in the gcc subdirectory.

Front-end installation

For the installation of the front-end executables, the language front-end needs to define a separate installation target, named <lang>.install-common:

When running through the above makefile extract, you will notice that the installation target only installs the compilation driver and not the compiler. The compiler gets automatically installed by the GCC makefile. If you are interested in the details, the compiler gets installed by the target install-common of the gcc makefile. While the compilation driver is installed in the {prefix}/bin directory, the compiler is put into the {prefix}/libexec/gcc/<target_noncanoncial>/<version> directory. As a note, {prefix} is the directory specified for the –prefix option of the configure script, <target_noncanoncial> is a string like x86_64-unknown-linux-gnu (YMMV) and the <version> usually has a form like x.y.z.