Code to Binary: An In-Depth Look at the C++ Building Process

23 Apr, 2023

From Source Code to Binary: Exploring the Process of C++ Compilation/Building.

Mikelis' Game Blog

In this article, we will take a step back from our usual high-level discussions and delve into the nitty-gritty of building a single .cpp file for a Windows target using the GNU Compiler Collection (GCC). Instead of relying on compilation shells, integrated development environments (IDEs), or extensive installable toolchains, we will aim to demystify the fundamental process of converting C++ code into an executable binary using a compiler — a process often concealed by today's advanced development environments. My goal with this article is to empower you to compile C++ code to different stages and see what they look like, as well as to paint a comprehensive high-level picture of how a C++ compiler works.

While this article is tailored for readers using Windows 10 or later on 64-bit machines, it remains accessible for users on different systems. With a little Googling or GPT-ing, you should be able to follow along as GCC is cross-platform, and we will not be utilizing any Windows-specific code or features not available in other operating systems.

The Building Process Overview

Before we dive in, let's take a moment to review the process of translating C++ code into machine code at a high level. The classic build process involves five main steps:

Mikelis' Game Blog

Coding. This is the core of software engineering — writing code in .cpp source files and .h header files. Naturally, the C++ programmer indirectly decides much of the machine code that will be built.
Preprocessing. During this stage, the preprocessor executes directives like #include or #define, merging the contents of included files directly into the .cpp files. After preprocessing, the expanded .cpp files are regarded as individual translation units and are considered to be internally included (so they often take the file extension .i or .ii).
Compiling. Each translation unit is transformed into assembly code, the final somewhat human-readable representation of the algorithms and data in our code. Compiler-level optimizations occur during this step, resulting in an assembly instruction file (either .s or .asm, depending on the compiler).
Assembling. The assembly code is converted into binary instructions that the target processor can execute. This binary data is stored in an object file (either .o or .obj, depending on the compiler) and is no longer human-readable.
Linking. Multiple object files are linked together to enable symbol usage across translation units. The linker also generates the final binary executable or dynamic library.

Because this is an Unreal Engine blog, I should mention that the Unreal Header Tool will run before the second step to do minimal static code analysis and code generation for the included .generated.h files.

It should also be mentioned that unified builds will sometimes amalgamate multiple translation units into one for faster compilation. This particularly speeds up compiling very large projects where headers are commonly reused in many units. But we will not explore this today.

The terms "building" and "compiling" are often used interchangeably to describe everything after step 1. For the sake of clarity, we will use the term "building" in this article. Additionally, we will use the term "compiler" to refer to software that performs the building process. Now, let's examine this process in greater detail.

The Building Process in Detail

Writing the Code

We won't need an IDE or complex code for our exploration of the building process. The following should be enough, feel free to simply paste this text into a .cpp file like C:\Dev\Project\Main.cpp:

#include <iostream>

int main() 
{
    std::cout << "Hello, World!";
    std::cin.get(); // Wait for ENTER.
}

Setting up the Compiler

To keep things simple, we will use a development toolchain for Windows called WinLibs. WinLibs is a standalone and portable package of MinGW, which includes a GCC port for Windows. GCC, a set of well-established preprocessors, compilers, assemblers, and linkers, supports multiple programming languages. Due to its Unix-based nature, GCC cannot run on Windows without adapters such as MinGW or Cygwin. All together, development tool packages like this are known as toolchains.

Download the latest UCRT version of GCC & MinGW for Win64 without LLVM, Clang, LLD, and LLDB. Extract the contents of the mingw64 directory into a folder like C:\Dev\Toolchain.

For those using Linux or macOS, GCC should be available in your package manager. As these operating systems are Unix-based, MinGW is not required.

A Closer Look at GCC

The GNU Compiler Collection (GCC) is a versatile and widely-used suite of preprocessors, compilers, assemblers, and linkers that supports various programming languages, such as C, C++, Objective-C, and even Fortran. Although GCC is a popular choice for C++ development, there are other compiler toolsets worth exploring. Clang, for instance, is known for its rapid compilation speed and outstanding diagnostic messages, though it generally still requires either MSVC or GCC. Another alternative is the Intel C++ Compiler, which provides Intel-specific processor optimizations that can enhance performance on certain Intel-based systems. In game development, MSVC — the toolset included with Microsoft's Visual Studio — is frequently used.

Building

Assuming your .cpp file is in C:\Dev\Project\Main.cpp and the MinGW toolchain with GCC is in C:\Dev\Toolchain, open your command prompt and set the PATH environment variable:

> SET PATH=C:\Dev\Toolchain\bin;%PATH%

Then compile the .cpp file into a binary executable:

> g++ C:\Dev\Project\Main.cpp -Wall -static -o C:\Dev\Project\Program.exe

If the building succeeds without errors or warnings, you won't see any output in the command prompt. However, if there are any errors or warnings (enabled by the -Wall flag), the output will resemble what you'd get when building with an IDE.

Now, run Program.exe and confirm that it prints "Hello, World!" correctly. This is really all you need to build C++ code - some code and some compiler binaries. But let's investigate the building process a bit more.

Preprocessing as a Separate Step

To output preprocessed code with g++, use the following command:

> g++ -E C:\Dev\Project\Main.cpp -o C:\Dev\Project\Main.ii

You can then compile the preprocessed code into an executable:

> g++ C:\Dev\Project\Main.ii -Wall -static -o C:\Dev\Project\Program.exe

Do be aware that preprocessed code is massive as it is effectively a .cpp file with headers included and other preprocessor directives executed to transmogrify the file. It is still very human-readable C++, however. I encourage opening the Main.ii and looking into it yourself!

Mikelis' Game Blog

Assembly as a Separate Step

To produce assembly code for our translation unit, run:

> g++ -S C:\Dev\Project\Main.cpp -o C:\Dev\Project\Main.s

You can then continue the compilation process from the assembly instructions:

> g++ C:\Dev\Project\Main.s -static -o C:\Dev\Project\Program.exe

Linking as a Separate Step

To build object files for translation units, use commands like:

> g++ C:\Dev\Project\Main.cpp -Wall -c -o C:\Dev\Project\Main.o

Then, link one or multiple object files into an executable file like so:

> g++ C:\Dev\Project\Main.o -static -o C:\Dev\Project\Program.exe

Manual Step-by-step Compiling

Mikelis' Game Blog

For those who prefer a hands-on approach, the following commands will compile our single translation unit step-by-step. Keep in mind that a compiler typically performs these steps in sequence automatically, but they can produce intermediate files in some configurations.

First, build the preprocessed code for the translation unit:

> g++ -E C:\Dev\Project\Main.cpp -o C:\Dev\Project\Main.ii

Note that the .ii file is really a C++ text file and can be looked into with a text editor.

Next, compile the preprocessed code into assembly instructions:

> g++ -S C:\Dev\Project\Main.ii -Wall -o C:\Dev\Project\Main.s

After that, assemble the machine code:

> g++ C:\Dev\Project\Main.s -c -o C:\Dev\Project\Main.o

Finally, link the object representing the translation unit and the libraries it uses into an executable binary:

> g++ C:\Dev\Project\Main.o -static -o C:\Dev\Project\Program.exe

Hurray! You made all the intermediate files commonly used in compilation yourself and then built the final executable.

The g++ compiler can also produce all these intermediate files in one go with commands such as:

> g++ C:\Dev\Project\Main.cpp -save-temps -Wall -static -o C:\Dev\Project\Program.exe

By convention, the temp files will be in the same directory as the .cpp files for the translation units. The .ii ("internally included") files will contain preprocessed C++ code. The .s ("assembly source") files will contain the ASM data and instructions, and the .o ("object") files will contain compiled data ready for linking.

Static and Dynamic Linking

Mikelis' Game Blog

We have been using the g++ linker flag -static, which links all libraries statically, including them in our final executable. This makes the executable quite large, but does not require that dynamically linked libraries (DLL) are present in the system at run-time. Windows provides us with many DLLs, so we may not wish to link them statically to reduce the executable size and use up to date Windows components. To only link specific libraries statically, use the -W1,Bstatic [-l<lib_name>, ...] -W1,Bdynamic flag sequence. For example:

> g++ C:\Dev\Project\Main.cpp -Wall -static-libgcc -Wl,-Bstatic -lstdc++ -lpthread -Wl,-Bdynamic -o C:\Dev\Project\Program.exe

All non-statically-linked libraries will need to be present as .dll files alongside our executable or in the system's PATH environment variable.

Conclusion

In this article, we delved into the process of building a single .cpp file with GCC on Windows, bypassing the need for an IDE, development shells, or system-wide toolchains. We took a somewhat deep dive into the compilation process, investigating the intermediate files generated by GCC at each stage to gain a clearer understanding of how it all works.

I hope you found some valuable insights in this article, whether about GCC, the compilation steps, or the realization that complex development environments aren't always necessary to build C++ applications.

Further Learning

For an alternative perspective on the compilation process, I recommend watching The Cherno's C++ series on YouTube.

#Assembling #Building #C++ Building #C++ Compilation #Compiling #Cpp #GCC #Linking #MinGW #Preprocessing #Toolchain #UHT #WingowsGCC