Thread: Build Project Across Multiple Machines

  1. #1
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445

    Build Project Across Multiple Machines

    I have a large project that takes approximately 20 minutes to build, and I'm always looking for ways to improve that. Even with a parallel build, on a machine with 16 cores, it's about 20 minutes to do a clean and build. I'm looking for a way to expand that parallel build across multiple machines, so that, when I run make, it does some sort of magic, and sends the source file to the other machine, runs the compile step, and copies the .o file back to the local machine when finished, while also displaying the output of the remote compilation on the local terminal. I've looked for solutions to this, and I'm coming up pretty short. Does anyone know of an existing, open-source solution that will allow me to do this? The build needs to be controlled from a single instance of make, on the local machine. It doesn't make sense to just do the same build across multiple machines, which is the closest answer I've found to my problem. I want some files to be built on one node, while others are built on a different node, and then all linked on the local machine. Even if I have to check out the repository on each node, I'm ok with that.
    What can this strange device be?
    When I touch it, it gives forth a sound
    It's got wires that vibrate and give music
    What can this thing be that I found?

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Unless you're in the habit of touching every single source file (and header file), then perhaps this:
    https://ccache.samba.org/
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    The problem comes when new features are added, requiring tables to be added to a database. I have .h/.cpp pairs of files for each database schema that I access, which are automatically generated from the database metadata, and they contain one class per table, with fields for each column and insert/update/delete methods. The header file is included by literally every source file, so every time it changes, everything must be rebuilt. When switching between development branches, there may be different versions of the files in the various branches. In that scenario, I'm doubtful that a compiler cache would be particularly effective. If I'm wrong, I'd be interested to know how it could be made to work in my situation.
    What can this strange device be?
    When I touch it, it gives forth a sound
    It's got wires that vibrate and give music
    What can this thing be that I found?

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Even though the files are regenerated, do they still contain exactly the same content?

    I mean, if the schema doesn't change, the generated cpp/h shouldn't change either.

    I'm not sure what ccache actually caches. If it's the output of "gcc -E" (the output of the pre-processor), then hopefully comments like "This was autogenerated by ... on ..." would not necessarily cause an otherwise identical source file to be recompiled.

    There's this, but I've never set one up myself.
    Distributed builds - Jenkins - Jenkins Wiki
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    A frequently-changed header file that is included nearly everywhere is a design smell.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  6. #6
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by brewbuck View Post
    A frequently-changed header file that is included nearly everywhere is a design smell.
    I generally agree, but I did explain my situation in post #3. The DB classes also contain relationship data - collections of child rows, pointers to parents, etc. Even if I had one file per table/class, adding a new table with foreign keys would necessarily affect other files. In a side-by-side comparison, with all other things being equal, a single header/cpp per schema results in a faster build than file-per-table, probably because it requires less mass storage I/O.

    I've played around with precompiled headers, and they seem to cut compilation time down by about 25%. Unfortunately, I have found no convenient way to have them generated automatically, using the CMake build system.
    What can this strange device be?
    When I touch it, it gives forth a sound
    It's got wires that vibrate and give music
    What can this thing be that I found?

  7. #7
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    One thing that I found that improved things, even more than I expected, was to switch to the gold linker. It brought my link time down about 48 seconds to about 22. I'll take improvements wherever I can find them.
    What can this strange device be?
    When I touch it, it gives forth a sound
    It's got wires that vibrate and give music
    What can this thing be that I found?

  8. #8
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    No cost: use a ramdisk.
    Cost but good investment: SSD.

    Edit: should clarify, when building in parallel with CPU power in excess, the bottleneck is clearly the disk I/O. Using an SSD can cut compile times in half easily. Ramdisk is even faster than an SSD, so.
    Last edited by Epy; 06-27-2016 at 12:02 AM.

  9. #9
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by Epy View Post
    No cost: use a ramdisk.
    Cost but good investment: SSD.

    Edit: should clarify, when building in parallel with CPU power in excess, the bottleneck is clearly the disk I/O. Using an SSD can cut compile times in half easily. Ramdisk is even faster than an SSD, so.
    Ramdisk and SSD are great suggestions. Getting an SSD is not really possible, given my environment.

    The build machine is a VM running on XenServer, with disk storage on an EMC SAN, connected via 4Gb FC. The EMC has 8GB of cache memory, so performance shouldn't suffer much. The build machine is, by far, the most active machine connected to the SAN, with more than ten times the IOPS of the other VMs combined. Sure, the storage is probably still a substantial bottleneck, but not as much as if it were running on a typical PC.

    I can conclusively say that using a ramdisk did not help at all. The difference in build times was not statistically significant.

    I tested ramdisk under the following conditions:

    12GB ramdisk, both tmpfs and ramfs types
    I ran builds on code based in normal storage, with TMPDIR set when running make, and also by checking out the project completely inside the ramdisk.
    I also ran a build in normal storage, under normal conditions.

    The variation between runs, including the control run, was less than 5 seconds. I take this to mean that my project is getting completely cached in system memory, and the build is not a disk-I/O bound operation.
    What can this strange device be?
    When I touch it, it gives forth a sound
    It's got wires that vibrate and give music
    What can this thing be that I found?

  10. #10
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    Makes sense I guess. I was thinking along the lines of compile performance on a typical PC as you mentioned. I had the same sort of wrestling match with compiling OCE several years ago, upgrading to SSD cut my compile time in half. Compiling with Clang instead of GCC helped too.

    Not sure if you'd want to or if it would even be a good idea for your scenario, but what I ended up doing ultimately was using droplets from DigitalOcean since I could use tons of cores + they have SSD drives. You can temporarily create as many droplets as you want, install dependencies quickly from their local repositories, do your build, and send files back to your PC. Costs less than a buck depending on how often you want to do that. I did a little performance review back when I was testing it. Sweet spot on their machines is between the 8 and 12 core option. Hazudra Fodder: DigitalOcean CPU and I/O performance

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 21
    Last Post: 02-19-2015, 12:16 AM
  2. Getting Eclipse to build project
    By c_weed in forum Tech Board
    Replies: 0
    Last Post: 10-15-2010, 01:53 PM
  3. why can't build project in KDevelop C/C++
    By Lauris in forum Linux Programming
    Replies: 21
    Last Post: 09-04-2007, 10:41 AM
  4. ssh to multiple machines via c program?
    By purest in forum Linux Programming
    Replies: 6
    Last Post: 12-12-2006, 11:25 AM
  5. How to build a C project?
    By jumpjack in forum C Programming
    Replies: 5
    Last Post: 01-26-2006, 07:35 AM

Tags for this Thread