GCC front-end (1): driver vs. compiler

This post is part of a series about GCC internals and specifically about howto create a new language front-end for GCC. For a list of related posts, please check this page.

If I would ask many people what the executable gcc is doing, most of the people would answer, “Well, it’s a compiler, though … it’s compiling the source file into a target file”. But that is NOT correct. The executable gcc is not a compiler although the abbreviation means GNU C compiler. gcc represents what is generally called a compiler driver or more generic a compilation driver, in the following just called driver. If you now think, this guy is completely insane, please add the -v option to one of your gcc commands and and check what gcc is really doing.

While a compiler really is only responsible for transforming the source file into (possibly optimized) target machine code, the driver is the high-level organizer in the overall compilation process creating a object/shared/executable file from one or more source files. Thereby, the driver divides the compilation process into several phases which greatly depend on the capabilities of the used programs. gcc in newer versions (for example 4.4.0 and newer) uses the following phases assuming that an executable is created from a single source file (gcc source.c -o exec):

1. compiler (cc1)
2. assembler (as; from GNU binutils)
3. collect2 (collect2; part of GCC)
   3.1. linker (ld; from GNU binutils)

Just a note: in earlier versions of gcc, the driver added a separate pre-processing phase before the compiler phase, but in the recent versions the pre-processor is omitted, because the compiler (cc1) incorporates the pre-processor, at least when using the default options. By using the -no-integrated-cpp option, you instruct the driver to split the compiler phase into 2 distinct phases: pre-processing and compiler.

Each of the phases above produces intermediate or temporary files (assuming that the -pipe option is not specified) as output files which then serve as input files for the subsequent phase, in detail:

  • pre-processor (if separate phase)
    • input: *.c
    • output: *.i
  • compiler
    • input: *.c & *.i files
    • output: *.s
  • assembler
    • input: *.s
    • output: *.o
  • collect2/linker
    • input: *.o
    • output: self-defined pattern, e.g. *.exe

Based on the file extension, the driver knows which phase to start with when processing the file. Though, if a *.s file is given as input file along with a *.o and *.c file to produce an executable in a single gcc command, gcc would run the *.s file through the assembler phase, the *.c file through the compiler and assembler phase and the resulting three *.o files finally through the collect2/linker phase. By the way, if an input file extension matches none of the defined extensions, the file is taken is collect2/linker input!

The default behavior of gcc is to try to produce an executable file from the input files. But, gcc could be instructed to stop after any of the above phases by using specific driver options:

-E : stop after pre-processing, produce a *.i file
-S : stop after compiler, produce a *.s file
-c : stop after assembler, produce a *.o file
none : stop after collect2/linker

Hint: If you want to run through all the phases with a single gcc command but neverthess keep the intermediate files, use the -save-temps option.

You may ask, why all this is important for a new language front-end in gcc? Right! Because your new language might require different or additional phases than the described ones and then you should know where to start with to bring your new language front-end driver to execute your specific phases. For this purpose, GCC (capital letters are used to distinguish the complete compiler project from the C-specific driver) is designed in a modular way to allow the front-ends to “register” new phases, but the addition/modification of the phases will be discussed in a later post. However, for the already interested reader, please take a look at the file gcc-x.y.z/gcc/gcc.c and one of the the files of an already existing front-end, for example of C++ in gcc-x.y.z/gcc/cp/lang-specs.h.

6 thoughts on “GCC front-end (1): driver vs. compiler”

  1. Gcc as a compilation suite when installed on any one platform (chip, OS), is a target dependent technology that passes the source code into a first pass (pre-processor) where the host’s target dependent header files redefine the source code via expansion resulting in a target defined source code. At that point, what is being compiled by the frontend of the compiler proper of gcc is a target dependent compiler front end.

    Gcc has never been a portability technology where one single source code is compiled by a language dependent, target independent frontend into an architecture neutral Intermediate Language (language independent, target independent) which is then passed to a target dependent back end for code generation/optimization.

    That said, gcc suffers from the same frontend host/target dependencies that most all conventionally produced compilers contain. For example, in the frontends, you may find a target dependent definition for sizeof integer which is most ofter done to try and eek out more speed of the total compilation.

  2. Hi Christ, hey John,

    yes, that’s right, the abbreviation GCC (upper-case letters) stands for GNU Compiler Collection. But the abbreviation gcc (lower-case letters) stands for GNU C compiler since it is the C compiler in the compiler collection.

    But thanks for reminding the difference …

    Best regards,
    Andi

  3. Hi Tom,

    thanks for your comment and sorry for the late response. Surely and obviously, there is always a kind of target/OS dependence in the system header files. And yes, the compiler also has some architecture-dependent definitions, e.g. size_t in the case of GCC.

    Regarding the portability technology, I’m not sure where you have your information from, but that’s exactly how the current internal GCC architecture is … For example, take one of the most recent optimization features, Link Time Optimization (LTO), which theoretically allows you to compile/link source code from various different languages. This only works if there is a common language-independent IR …

    BTW, I would appreciate to discuss this further with you,
    Andi

Leave a Reply

Your email address will not be published. Required fields are marked *