Tweak CI settings
Our current CI configuration has two issues described below.
ccache is used ineffectively
gitlab-runner
cache is effectively a ZIP file passed between jobs. Creating a ZIP archive containing a ccache directory, potentially containing hundreds of thousands of files taking up several gigabytes of storage space is a bad idea.
Our current CI configuration also causes the ccache directory to be treated as a build artifact, which only makes things worse.
What we should do instead is to stop using the cache
directive altogether, create a ccache directory on each runner and mount it inside containers. This will allow ccache data to persist between jobs without requiring it to be packed into a ZIP file at the end of each job.
As a side note for posterity, it is critical for /etc/gitlab-runner/config.toml
to contain volumes = ["/cache"]
for the cache
directive to actually work, at least with Docker. Otherwise, gitlab-runner
will just create a ZIP file when a job is finished, store it in /cache/<namespace>/<project>
and then tear the container down, obliterating the cache it just created. Adding the volume
line mentioned above causes /cache
to be persisted.
make
uses fixed concurrency settings
Our CI jobs currently always use make -j6
for building and make -j8
for running system tests. Meanwhile, concurrency settings need to be tweaked separately for each runner to ensure stability. We should modify our .gitlab-ci.yml
to fetch the number of parallel make
jobs to use from environment variables which will subsequently be set by each runner through its configuration file.
There are three rules I can think of for tweaking concurrency-related settings:
- With ccache being used, building becomes more (though not fully, of course) I/O-bound than CPU-bound.
- As a rule of thumb, it should be assumed that each parallel system test being executed uses about 0.5 GB of RAM.
- Currently, total system test execution time plateaus around
make -j8
because some tests just take a long time to run.
Apart from the above, the number of concurrent jobs make
is allowed to use while building and testing has to be considered together with the concurrent
setting in /etc/gitlab-runner/config.toml
which limits the number of CI jobs allowed to be run concurrently at any time on a given runner.
Considering the above, an optimal CI machine would have:
-
8+ CPU cores: 8 cores allow concurrent compilation for two OS images using
make -j4
without putting to much strain on the host (ccache storage is not infinite, so we cannot rely on everything being cached beforehand), which sounds like a bare minimum; more cores would enable quicker pipeline completion in case multiple branches and/or more OS images are to be processed concurrently, - fast storage (or lots of RAM, so that ramdisks can be used for ccache data and/or compilation), to let ccache shine,
-
more than 8 GB of RAM: with tests for two OS images running concurrently, each image using
make -j8
, top RAM utilization around 8 GB may occur (2 * 8 * 0.5), which would lead to swapping; the more RAM the machine has, the more concurrent test phase jobs (e.g. for different branches and/or different OS images) it will be able to handle (though more RAM itself will not cause system tests to complete faster because increasing the number of parallel jobs used while testing beyond 8 has virtually no effect).
In any case, there are a few trade-offs to consider here, so allowing runner-specific concurrency settings in our .gitlab-ci.yml
sounds like a good idea no matter what machine(s) we will be using for CI in the end.