Tag Archives: compiler

LLVM Setup

This short post is about small Python script that facilitates the setup of LLVM software builds. The documentation about how-to build LLVM by yourself is great and detailed in e.g. LLVM Getting Started, however there is one problem that I am usually confronted with once I want to build LLVM including all its components like clang, compiler-rt, libcxx, etc.: what is the exact download path for each component (either compressed archives or SVN/GIT) and more importantly which directory inside the LLVM source tree do I have to put the components’ files into?

I wrote a small and simple Python script, available on Github that takes care for you to setup the LLVM source tree containing the LLVM components that you would like to build. For usage details, please checkout the Github link, here is just a sample command sequence on how-to use the script:

GCC front-end whitepaper

In the last couple of months, I created a white paper about GCC front-end internals. This white-paper is not yet complete and there are many many other areas of the compiler which are worth describing.

So, if you do have any valuable feedback for the white paper or if you do have areas which you wish to get some internal documentation about, please just let me know and I will think about adding more sections.

Just drop me a comment or an eMail (andi@mail.lxgcc.net) …

The white paper could be downloaded here!

GCC front-end (2): language-specific files

This post is part of a series about GCC internals and specifically about how-to create a new language front-end for GCC. For a list of related posts, please check this page.

The last post gave a short introduction into the differences between compiler and (compilation) driver and how-to control the different phases of the gcc/g++ drivers. This post now explains the basic file and directory structure of a GCC front-end (the file and directory structure of the GCC project in general will be shown where appropriate in the context of the front-end explanation).

Except the C compiler, the source code for each GCC front-end is located in a separate directory, namely gcc-x.y.z/gcc/any_name, e.g. gcc-x.y.z/gcc/any_sample_fe. Just as a note in this context, to configure GCC for a new front-end with the –enable-languages configure option, don’t use the directory name, but the language name configured in gcc-x.y.z/gcc/any_sample_fe/config-lang.in as described below.

After the extraction of the sample, minimal front-end as described here, the following files shall be typically found in the front-end directory:

General front-end language configuration used by the configure/make build process, like the name of the language (parameter: language) or the file name of the compiler (parameter: compilers) among others. This file gets read by the configure scripts (gcc-x.y.z/configure, gcc-x.y.z/gcc/configure, etc.) and the contents get incorporated into the generated makefiles. For a detailed description of the single parameters, please check the GCC internals manual (Chapter available here.

Language-specific driver and compiler options which are automatically parsed by the GCC common code and passed to a front-end specific function for final analysis and internal processing. For a sample file layout, check out the files gcc-x.y.z/gcc/c.opt and gcc-x.y.z/gcc/common.opt

This is the specification used by the GCC driver infrastructure to handle the specific phases of the compilation process. As described in the last post, GCC selects the phases and corresponding tools (e.g. compiler) based on file extensions. Though assume, as an example, you want to create a new language which shouldn’t be directly translated into machine code, but beforehand into C/C++ code. Assume furthermore that your new language files end in ‘.my_c_ext’. By using the lang-specs.h file, you could then instruct your language driver to pass your new language file to your compiler which creates a ‘.c’ file. This file could/will then further be processed by the common tools, e.g. cc1, as, ld, etc. For a sample file layout, please check the file gcc-x.y.z/gcc/gcc.c which greatly explains the syntax of the lang-specs.h file.

Language-specific makefile fragment. This fragments gets included into the GCC Makefile (builddir/gcc/Makefile). This makefile fragment should contain the instructions to build and install the language-specific driver, compiler, man pages and documentation.

lang-tree.def, e.g. sample_fe-tree.def
To simplify the creation of new front-ends and the interaction of the front-end with the middle-end/back-end, GCC provides a language-independent abstract-syntax tree (AST) named GENERIC. GENERIC is a tree-based representation while each tree node has a unique tree code. All the tree codes available by GCC are listed in the file gcc-x.y.z/gcc/tree.def. Most of these tree codes suffice the purposes of a new language front-end, but you might need additional ones for specific language constructs. These extra tree codes are put into a language-specific tree definition file. The naming convention for this file is to use the front-end directory as the first part of the name instead of the language parameter in the config-lang.in file. For example, the C++ front-end – located in the directory gcc-x.y.z/gcc/cp – defines the file gcc-x.y.z/gcc/cp/cp-tree.def. For a sample layout, please check the file gcc-x.y.z/gcc/tree.def. But please keep in mind, the GCC middle-end expects GIMPLE – three-address code tuples – as input (in earlier version of GCC, GIMPLE was a tree-based intermediate representation based on GENERIC). The tree codes in gcc-x.y.z/gcc/tree.def could be automatically transformed into GIMPLE by the GCC gimplifier, but additional tree codes must be manually transformed into GIMPLE code.

driver-specific source files
compiler-specific files
All the source files for the driver, compiler or whatever tools are required for the new language front-end.

GCC front-end (1): driver vs. compiler

This post is part of a series about GCC internals and specifically about howto create a new language front-end for GCC. For a list of related posts, please check this page.

If I would ask many people what the executable gcc is doing, most of the people would answer, “Well, it’s a compiler, though … it’s compiling the source file into a target file”. But that is NOT correct. The executable gcc is not a compiler although the abbreviation means GNU C compiler. gcc represents what is generally called a compiler driver or more generic a compilation driver, in the following just called driver. If you now think, this guy is completely insane, please add the -v option to one of your gcc commands and and check what gcc is really doing.

While a compiler really is only responsible for transforming the source file into (possibly optimized) target machine code, the driver is the high-level organizer in the overall compilation process creating a object/shared/executable file from one or more source files. Thereby, the driver divides the compilation process into several phases which greatly depend on the capabilities of the used programs. gcc in newer versions (for example 4.4.0 and newer) uses the following phases assuming that an executable is created from a single source file (gcc source.c -o exec):

1. compiler (cc1)
2. assembler (as; from GNU binutils)
3. collect2 (collect2; part of GCC)
   3.1. linker (ld; from GNU binutils)

Just a note: in earlier versions of gcc, the driver added a separate pre-processing phase before the compiler phase, but in the recent versions the pre-processor is omitted, because the compiler (cc1) incorporates the pre-processor, at least when using the default options. By using the -no-integrated-cpp option, you instruct the driver to split the compiler phase into 2 distinct phases: pre-processing and compiler.

Each of the phases above produces intermediate or temporary files (assuming that the -pipe option is not specified) as output files which then serve as input files for the subsequent phase, in detail:

  • pre-processor (if separate phase)
    • input: *.c
    • output: *.i
  • compiler
    • input: *.c & *.i files
    • output: *.s
  • assembler
    • input: *.s
    • output: *.o
  • collect2/linker
    • input: *.o
    • output: self-defined pattern, e.g. *.exe

Based on the file extension, the driver knows which phase to start with when processing the file. Though, if a *.s file is given as input file along with a *.o and *.c file to produce an executable in a single gcc command, gcc would run the *.s file through the assembler phase, the *.c file through the compiler and assembler phase and the resulting three *.o files finally through the collect2/linker phase. By the way, if an input file extension matches none of the defined extensions, the file is taken is collect2/linker input!

The default behavior of gcc is to try to produce an executable file from the input files. But, gcc could be instructed to stop after any of the above phases by using specific driver options:

-E : stop after pre-processing, produce a *.i file
-S : stop after compiler, produce a *.s file
-c : stop after assembler, produce a *.o file
none : stop after collect2/linker

Hint: If you want to run through all the phases with a single gcc command but neverthess keep the intermediate files, use the -save-temps option.

You may ask, why all this is important for a new language front-end in gcc? Right! Because your new language might require different or additional phases than the described ones and then you should know where to start with to bring your new language front-end driver to execute your specific phases. For this purpose, GCC (capital letters are used to distinguish the complete compiler project from the C-specific driver) is designed in a modular way to allow the front-ends to “register” new phases, but the addition/modification of the phases will be discussed in a later post. However, for the already interested reader, please take a look at the file gcc-x.y.z/gcc/gcc.c and one of the the files of an already existing front-end, for example of C++ in gcc-x.y.z/gcc/cp/lang-specs.h.