Automation through Makefiles

4. Automation through Makefiles#

Automating computational (and repetitive) tasks is completely related to the efficient use of a computer. Furthermore, once a task is automated, its automation process also becomes a documentation of its workings, helping maintaining and sharing it. Of course, there is always a balance between automating to be efficient or trying to automate too much:

Fo our case, we will focus on a simple yet common computational task: generating a figure from data produced from a cpp file. Our workflow is basically creating the program -> compiling -> running to produce data -> plotting. What happens if there is a change in the program? all worflow must be rerun. But what if only the plot script changes? then only replotting is needed. A tool like make allows to establish dependencies among objects and to run programs when some dependency changes. The make tool has been around for very long, is well tested, and although its language could be cumbersome, it is good enouch for simple cases like the ones we need. There are other kinds of ‘automation’, like infrastructure as code, or configuration as code (see terraform, opentofu, ansible, puppet, salt, and so on). But always remember

Makefiles are used everywhere, from small projects up to huge ones like the linux kernel: https://www.kernel.org/ .

4.1. What is a Makefile#

A makefile is a text file where you put target dependencies that allow you to automatize the compilation process by following given rules, and for large projects this could actually speed up significantly the compilation process. You can also use a Makefile for latex projects and other related task where you want to reload and create automatically an updated version of a document when you have updated a small part. A rule in a makefile is written as

target : dependencies separated by space
    rule to build target (from dependencies if needed)
# Example:
#sumupdown.x : sumupdown.cpp
#   g++ sumupdown.cpp -o sumupdown.x

It is important to note that the beginning of the rule is indented with real tab, not with spaces.

In this example, there is the following depedency graph

%load_ext nb_js_diagrammers

%%mermaid_magic -h 350 
graph TD
    A[main.exe] --> B[main.o]
    A --> C[utils.o]
    B --> D[main.c]
    B --> E[defs.h]
    C --> F[utils.c]
    C --> G[utils.h]
    C --> E

And the following happens if there are or not a change in a given file

%%mermaid_magic -h 200
flowchart TB
    subgraph "Changes Ripple Through Dependencies"
        direction LR
        FileA[utils.h -> modified] -->|triggers| FileB[utils.o -> rebuilt]
        FileB -->|triggers| FileC[main -> rebuilt]
        FileB -->|triggers| FileD[tools -> rebuilt]
        
        style FileA fill:#ff9999
        style FileB fill:#ffcc99
        style FileC fill:#99ccff
        style FileD fill:#99ccff
    end
    
    subgraph "Unchanged Files"
        direction LR
        File1[utils.h -> unchanged] --- File2[main -> no rebuild]
        
        style File1 fill:#ccffcc
        style File2 fill:#ccffcc
    end

There are some ‘modern’ tools/alternatives, with differnt scope, that you can check:

remake : https://remake.readthedocs.io/en/latest/index.html
just : https://just.systems/, https://cheatography.com/linux-china/cheat-sheets/justfile/, a simple task runner
cmake: https://cmake.org/, https://crascit.com/professional-cmake/, https://buchanan.one/blog/zero-to-cmake/, friendlyanon/cmake-init, a full makefiles generator that is also multiplatform and somehow the standard to install programs from source.
xmake: https://xmake.io/#/, a task file plus packages installation.
taskfile: https://taskfile.dev/: to run tasks
scons: https://scons.org/, make alternative in python
ninja, bazel, ..., full package managers and also dependency checkers and runners

Note: For more information about make, please see:

4.2. Application example: Final figure creation#

Imagine that you want to automatize the creation of the figure of the sumup and sumdown exercise from the numerical errors section. In this case you should think about the dependencies: the latest figure, sumupdown.pdf will depend on the a given gnuplot (or python) script (plot.gp) that read some data from a given file (data.txt), so any change on those files should trigger and update on the figure. Furthermore, the data file will actually depend on the source (sumupdown.cpp) and its related executable (sumupdown.x), since any change there may imply new data. And actually the executable depends directly on the source code. We could express this dependencies as

Therefore we have:

The executable depends on the source code, if it changes, then it must be recompiled. This is the typical usage, to update executables.
```
  sumupdown.x: sumupdown.cpp
      g++ sumupdown.cpp -o sumupdown.x
```
The data depends on the executable: if it changes then the data must be generated again. Changes in the executable (or changes in the source code) will trigger this rule.
```
  data.txt: sumupdown.x
      ./sumupdown.x > data.txt
```
The final figure depends on updates on the plotting script and the data
```
fig.pdf: script.gp data.txt
    gnuplot script.gp
```
By the dependency tree formed, changes in the source, executable, or data, will trigger this.

Our final simple makefile will look like, save it into a file called Makefile

fig.pdf: script.gp data.txt
    gnuplot script.gp

data.txt: sumupdown.x
    ./sumupdown.x > data.txt

sumupdown.x: sumupdown.cpp
    g++ sumupdown.cpp -o sumupdown.x

The actual order is not important. Every time you run make it will check if any target is outdated and run the command needed. The follwing code shows a simple solution for the sum up and down. You need to actually implement the functions, parametrize the data type, and also read the limit from the command line.

#include <iostream>
#include <cmath>

typedef float REAL; 

REAL sumup(int nterms);
REAL sumdown(int nterms);

int main(int argc, char **argv)
{
  std::cout.precision(6); std::cout.setf(std::ios::scientific);
  for (int ii = 1; ii <= 10000; ++ii) {
    REAL sum1 = sumup(ii);
    REAL sum2 = sumdown(ii);
    std::cout << ii << "\t" << sum1 << "\t" << sum2
              << "\t" << std::fabs(sum1-sum2)/sum2 << "\n";
  }

  return 0;
}

REAL sumup(int nterms)
{
    // todo
    
}

REAL sumdown(int nterms)
{
    // todo
}

And this is a simple gnuplot script example (you can also use matplotlib or any tool that can be called from the command line)

set term pdf
set out "sumupdown.pdf"
# set xlabel "Nterms"
# set ylabel "Percentual difference"
plot 'data.txt' u 1:4 w lp pt 4

Now just run make and see what happens. Then uncomment, for example, the commented lines inside the gnuplot script and run make again. Check that make just reruns the needed rule, it does not recompile the whole program.

4.3. Adding some internal `make` vars#

Actually, the makefile can be simplified and generalized using variables

fig.pdf: script.gp data.txt
    gnuplot $<

data.txt: sumupdown.x
    ./$< > $@

# Generic rule to compile any cpp into its corresponding .x 
%.x: %.cpp
    g++ $< -o $@

Explanation:

$< : This is the first dependency
$^ : All dependencies
$@ : This is the rule target
%.x: %.cpp: For every .cpp file, check the corresponding .x file

And some others:

$(@D) the directory part of the target
$(@F) the file part of the target
$(<D) the directory part of the first prerequisite (i.e., dependency)
$(<F) the file part of the first prerequisite (i.e., dependency)

Also, if you separate your sources from the executables (always better), you can specify targets as

build/%.o: src/%.c
	gcc -c $< -o $@

4.4. Troubleshooting Common Make Issues#

4.4.1. Syntax Errors#

Tabs vs. Spaces: Make requires tabs for indentation. If you get *** missing separator errors, check that command lines begin with tabs, not spaces. Missing Colons: Every rule must have a colon after the target name.

4.4.2. Circular Dependencies#

Error: Circular filename1 <- filename2 dependency dropped Problem: Your rules form a loop where A depends on B and B depends on A. Solution: Restructure your dependencies to form a directed acyclic graph.

4.4.3. Files Not Being Rebuilt#

Problem: Changes to a file aren’t triggering rebuilds of dependent targets. Possible causes:

Missing dependency declaration Incorrect timestamp (use touch to update) Pattern rule not matching as expected

Diagnosis: Run make -d for debug output showing Make’s decision process.

4.4.4. Phony Targets#

Problem: Command runs even when a file with the target name exists. Solution: Declare the target as .PHONY: target_name

4.4.5. Environment Variables#

Problem: Make isn’t using the expected environment variables. Solution: Use export VAR=value in the Makefile or run export VAR=value before running make.

4.4.6. Wildcard Pitfalls#

Problem: Wildcard (*) not expanding as expected. Solution: Use $(wildcard *.c) instead of *.c for reliable expansion.”

4.5. Application example: many files to compile#

Now, let’s try to test our functions. To do so, we need another main function to test them, to separate our actual use from the tests. Therefore, it is better to put the functions declarations and implementations into their own files. You will now have three files:

functions.h: It will have the functions declarations. Add an inclusion guard with pragma or define.
functions.cpp: It will have the functions implementations (and will include functions.h).
main.cpp: This has the main function. Includes the headers functions.h. You can create as many main files as you want, one for using the function another for testing etc.

Your first task is to move the declarations into functions.h, and the implementations into functions.cpp. Remember to include functions.h both into the implementations and then into the main file.

The second task is to update the Makefile to compile the whole thing. The manual command is

# this creates the object functions.o
g++ -c functions.cpp
# this creates the object main.o
g++ -c main.cpp
# this links with main and creates executable main.x
g++ main.o functions.o -o main.x

What could be the corresponding make rule to build the objects and the executable? here is an example for the later

main.x : main.o functions.o
    g++ -o main.x main.o functions.o

Test this with the old main function. Where are the rules for creating the object files? The objects are created automatically. These is done through something called automatic rules.

4.5.1. Exercise#

To test the usefulness of this:

Change a bit the functions.cpp file and recompile with make. Are all objects updated?
Create another main file called test.cpp which test something about your functions (like calling them with negative limits). Add the rule to the makefile, so you can just write make test to test the functions.

4.6. Using variables in the makefile#

After seeing this advantages, let’s try to generalize and use more useful syntax for make. Fir instance, you can specify variables, which can be overridden on the command line. Something like

CXX=g++
CXXFLAGS=-I.

main.x: foo.o bar.o
  $(CXX) $(CXXFLAGS) -o main.x main.cpp foo.o bar.o

Compile again and test. Everything should work.

Now, we will specify a generic rule to create the .o objects, and also the name of the objects in a variable. Furthermore, we use $@ to specify the name of the target, and $^ which symbolizes all the items in the dependency list

CXX = g++
CXXFLAGS = -I.
OBJ = foo.o bar.o

main.x: $(OBJ)
  $(CXX) $(CXXFLAGS) -o $@ $^ main.cpp

Save and run it, you will still get the same but the makefile is becoming more generic, and therefore more useful.

Now let’s specify the generic rule to create the .o files, and also specify that if we change the .h files then a recompilation is needed (for now our makefile only detect changes in .cpp files):

CXX = g++
CXXFLAGS = -I.
OBJ = foo.o bar.o
DEPS = foo.h bar.h

%.o: %.cpp $(DEPS)
  $(CXX) -c -o $@ $< $(CXXFLAGS)

main.x: $(OBJ)
  $(CXX) $(CXXFLAGS) -o $@ $^ main.cpp

Where we have specify the dependency on DEPS, and also we are using a generic rule to tell how to create any .o file (exemplified with the generic %.o symbol), from any corresponding %.c file, and $< means the first item on the deps list. Again, save it and run it.

You can rules to clean the directory (to erase object files, for example), to add libs, to put the headers on another directory, etc. For example, the following adds a phony rule (which does not create any target) to clean the directory

CXX = g++
CXXFLAGS = -I.
OBJ = foo.o bar.o
DEPS = foo.h bar.h

%.o: %.cpp $(DEPS)
  $(CXX) -c -o $@ $< $(CXXFLAGS)

main.x: $(OBJ)
  $(CXX) $(CXXFLAGS) -o $@ $^ main.cpp

.PHONY: clean
clean:
  rm -f *.o *~ *.x

In order to run this rule, you need to write

make -f makefile-v5 clean

And, finally, a more complete example with comments and new syntax, which you can adapt for your own needs

CXX = g++
CXXFLAGS = -I.
LDFLAGS =
SOURCES = main.cpp foo.cpp bar.cpp
#SRC_FILES := $(wildcard *.cpp) # better
OBJ = $(SOURCES:.cpp=.o) # extracts automatically the objects names
DEPS = foo.h bar.h

all : main.x $(SOURCES) $(DEPS) # this is the default target

main.x: $(OBJ)
  @echo "Creating main executable ..."
  $(CXX) $(CXXFLAGS) -o $@ $^ $(LDFLAGS)

.cpp.o:
  $(CXX) -c -o $@ $< $(CXXFLAGS)

.PHONY: clean
clean:
  rm -f *.o *~ *.x

4.7. Exercises#

Take the codes for overflow and underflow and automatize the detection of overflow with the standard function isinf. When they are detecting overflow automatically, stopping and printing where the overflow occurs, put them in separated files (.h and .cpp). Then create a main file which calls those functions. Now write a makefile to automatize the compilation. Use the latest version of the example Makefile. When everything is working send it to assigned repository. Do the same now including the code which computes the machine eps, using its own .h and .cpp .
If you use latex, how will you write a Makefile for that? Well, at the end you can use latexmk, of course. Try to think on other uses for make (like automatic zipping and mailing of some important file, )
Extend your Makefile to support three build modes:
- DEBUG: Enables debugging symbols and disables optimization
- RELEASE: Enables optimization and disables debug symbols
- PROFILE: Enables both profiling and debugging symbols
The mode should be selectable by setting an environment variable or passing it to make.
Modify your Makefile to automatically discover and track .h dependencies without manually listing them. Hint: Look into gcc’s -M, -MM options and include generated dependency files. Is something like (Check also https://www.gnu.org/software/make/manual/html_node/Automatic-Prerequisites.html)
```
SRCS = main.c utils.c
OBJS = $(SRCS:.c=.o)
DEPS = $(SRCS:.c=.d)

%.o: %.c
gcc -MMD -c $< -o $@
```
Analyze and fix a provided Makefile so it correctly supports parallel builds with ‘make -j’. Identify race conditions and implement proper order-only prerequisites.
Create a build system for a project with the following structure:
- /project
- /lib - Contains a utility library
- /app - Contains the main application that uses the library
- /tests - Contains test files for both the library and application
Create a main Makefile and appropriate sub-Makefiles that allow building everything, just the library, just the application, or running tests.
Extend your Makefile to:
- Generate documentation using Doxygen
- Run tests with different levels (unit, integration)
- Package the application for distribution
- Deploy to a specified location
Refactor a verbose Makefile using Make functions like $(foreach), $(patsubst), $(shell), $(call), and conditional assignment operators (?=, :=, +=, etc.) to make it more concise and maintainable.