Build Project Across Multiple Machines

**Elkvis** · 06-14-2016

I have a large project that takes approximately 20 minutes to build, and I'm always looking for ways to improve that. Even with a parallel build, on a machine with 16 cores, it's about 20 minutes to do a clean and build. I'm looking for a way to expand that parallel build across multiple machines, so that, when I run make, it does some sort of magic, and sends the source file to the other machine, runs the compile step, and copies the .o file back to the local machine when finished, while also displaying the output of the remote compilation on the local terminal. I've looked for solutions to this, and I'm coming up pretty short. Does anyone know of an existing, open-source solution that will allow me to do this? The build needs to be controlled from a single instance of make, on the local machine. It doesn't make sense to just do the same build across multiple machines, which is the closest answer I've found to my problem. I want some files to be built on one node, while others are built on a different node, and then all linked on the local machine. Even if I have to check out the repository on each node, I'm ok with that.

**Salem** · 06-14-2016

Unless you're in the habit of touching every single source file (and header file), then perhaps this:
https://ccache.samba.org/

**Elkvis** · 06-14-2016

The problem comes when new features are added, requiring tables to be added to a database. I have .h/.cpp pairs of files for each database schema that I access, which are automatically generated from the database metadata, and they contain one class per table, with fields for each column and insert/update/delete methods. The header file is included by literally every source file, so every time it changes, everything must be rebuilt. When switching between development branches, there may be different versions of the files in the various branches. In that scenario, I'm doubtful that a compiler cache would be particularly effective. If I'm wrong, I'd be interested to know how it could be made to work in my situation.

**Salem** · 06-14-2016

Even though the files are regenerated, do they still contain exactly the same content?

I mean, if the schema doesn't change, the generated cpp/h shouldn't change either.

I'm not sure what ccache actually caches. If it's the output of "gcc -E" (the output of the pre-processor), then hopefully comments like "This was autogenerated by ... on ..." would not necessarily cause an otherwise identical source file to be recompiled.

There's this, but I've never set one up myself.
Distributed builds - Jenkins - Jenkins Wiki

**brewbuck** · 06-21-2016

A frequently-changed header file that is included nearly everywhere is a design smell.

**Elkvis** · 06-22-2016

Originally Posted by brewbuck

A frequently-changed header file that is included nearly everywhere is a design smell.

I generally agree, but I did explain my situation in post #3. The DB classes also contain relationship data - collections of child rows, pointers to parents, etc. Even if I had one file per table/class, adding a new table with foreign keys would necessarily affect other files. In a side-by-side comparison, with all other things being equal, a single header/cpp per schema results in a faster build than file-per-table, probably because it requires less mass storage I/O.

I've played around with precompiled headers, and they seem to cut compilation time down by about 25%. Unfortunately, I have found no convenient way to have them generated automatically, using the CMake build system.

**Elkvis** · 06-23-2016

One thing that I found that improved things, even more than I expected, was to switch to the gold linker. It brought my link time down about 48 seconds to about 22. I'll take improvements wherever I can find them.

**Epy** · 06-26-2016

No cost: use a ramdisk.
Cost but good investment: SSD.

Edit: should clarify, when building in parallel with CPU power in excess, the bottleneck is clearly the disk I/O. Using an SSD can cut compile times in half easily. Ramdisk is even faster than an SSD, so.

**Elkvis** · 06-27-2016

Originally Posted by Epy

No cost: use a ramdisk.
Cost but good investment: SSD.

Edit: should clarify, when building in parallel with CPU power in excess, the bottleneck is clearly the disk I/O. Using an SSD can cut compile times in half easily. Ramdisk is even faster than an SSD, so.

Ramdisk and SSD are great suggestions. Getting an SSD is not really possible, given my environment.

The build machine is a VM running on XenServer, with disk storage on an EMC SAN, connected via 4Gb FC. The EMC has 8GB of cache memory, so performance shouldn't suffer much. The build machine is, by far, the most active machine connected to the SAN, with more than ten times the IOPS of the other VMs combined. Sure, the storage is probably still a substantial bottleneck, but not as much as if it were running on a typical PC.

I can conclusively say that using a ramdisk did not help at all. The difference in build times was not statistically significant.

I tested ramdisk under the following conditions:

12GB ramdisk, both tmpfs and ramfs types
I ran builds on code based in normal storage, with TMPDIR set when running make, and also by checking out the project completely inside the ramdisk.
I also ran a build in normal storage, under normal conditions.

The variation between runs, including the control run, was less than 5 seconds. I take this to mean that my project is getting completely cached in system memory, and the build is not a disk-I/O bound operation.

**Epy** · 06-28-2016

Makes sense I guess. I was thinking along the lines of compile performance on a typical PC as you mentioned. I had the same sort of wrestling match with compiling OCE several years ago, upgrading to SSD cut my compile time in half. Compiling with Clang instead of GCC helped too.

Not sure if you'd want to or if it would even be a good idea for your scenario, but what I ended up doing ultimately was using droplets from DigitalOcean since I could use tons of cores + they have SSD drives. You can temporarily create as many droplets as you want, install dependencies quickly from their local repositories, do your build, and send files back to your PC. Costs less than a buck depending on how often you want to do that. I did a little performance review back when I was testing it. Sweet spot on their machines is between the 8 and 12 core option. Hazudra Fodder: DigitalOcean CPU and I/O performance

Thread: Build Project Across Multiple Machines

Thread Tools

Search Thread

Display

Build Project Across Multiple Machines

Similar Threads

A git flow for single-user development on multiple machines

Getting Eclipse to build project

why can't build project in KDevelop C/C++

ssh to multiple machines via c program?

How to build a C project?

Tags for this Thread