Mmm, where to begin...
Are your threads arranged in producer/consumer pairs, where one thread does some work and then passes it on.
Or as a worker pool, where a controller divides up work into a number of segments, passes each work unit to a thread and then waits around to collect the results.
> The code is riddled with global state variables which are modified in many places, often with no synchronization.
The problem with your incremental approach is that you don't know how finely balanced the existing system is. Any change you make can have far reaching consequences that could take a long time before they become apparent.
> Tests are challenging because of the large number of globals.
The tests you need at this stage should be treating your code as a black box. Data goes in, results come out.
A large corpus of input (both real world and simulated) is about the best you can hope for.
So for each change you make, you have a known data set to feed in, to compare with a known set of corresponding results.
> Another technique I've started to use is switching from a variable with global scope to file scope, then writing get/set functions with a file scope mutex.
This risks turning your multi-threaded code into a poorly performing lock-happy single-threaded application because every single thread winds up in someone else's queue for no good reason.
One thing to establish is which variables are used by which threads, and when.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>
#include <time.h>
#define MAX_EVENTS 1000000
typedef struct {
pthread_t tid;
struct timespec time;
} event_t;
event_t events[MAX_EVENTS];
int current_event = 0;
void record_time(void) {
int n = __atomic_fetch_add(¤t_event,1,__ATOMIC_SEQ_CST);
if( n < MAX_EVENTS ) {
events[n].tid = pthread_self();
clock_gettime(CLOCK_REALTIME, &events[n].time);
}
}
void print_times(FILE *fp) {
for(int i = 0 ; i < current_event; i++ ) {
fprintf(fp,"%lx %ld:%ld\n", events[i].tid,
events[i].time.tv_sec, events[i].time.tv_nsec);
}
}
// Some global variable which may (or may not) be shared.
int global_var;
#define MAX_LOOP 10000
void *worker(void *p) {
for(int i = 0 ; i < MAX_LOOP ; i++ ) {
if( rand() % 100 < 50 ) {
global_var++;
record_time();
}
usleep(1);
}
return NULL;
}
int main()
{
pthread_t tids[5];
for(int i = 0 ; i < 5 ; i++ ) {
pthread_create(&tids[i],NULL,worker,NULL);
}
for(int i = 0 ; i < 5 ; i++ ) {
pthread_join(tids[i],NULL);
}
printf("Final result in global=%d\n", global_var);
printf("Number of debug records=%d\n", current_event);
print_times(stdout);
}
$ ./a.out | head
Final result in global=24862
Number of debug records=24867
7f37e89eb700 1592026468:827352674
7f37ea1ee700 1592026468:827395393
7f37e89eb700 1592026468:827410460
7f37e81ea700 1592026468:827431857
7f37ea1ee700 1592026468:827460899
7f37e81ea700 1592026468:827499052
7f37e91ec700 1592026468:827500700
7f37ea1ee700 1592026468:827519506
Maybe you'll find that particular variables are used by only one thread.
Maybe you'll find that particular variables are used by two threads, but at very different times.
Perhaps tweak the tracer to
- record whether it was a read or write access
- record only the first and last access times on a per thread basis
- record the number of times on a per thread basis
Another option (maybe) is to use helgrind, part of valgrind.
Code:
$ gcc -g bar.c -pthread
$ valgrind --tool=helgrind ./a.out
==6496== Helgrind, a thread error detector
==6496== Copyright (C) 2007-2015, and GNU GPL'd, by OpenWorks LLP et al.
==6496== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==6496== Command: ./a.out
==6496==
==6496== ---Thread-Announcement------------------------------------------
==6496==
==6496== Thread #3 was created
==6496== at 0x51643DE: clone (clone.S:74)
==6496== by 0x4E46149: create_thread (createthread.c:102)
==6496== by 0x4E47E83: pthread_create@@GLIBC_2.2.5 (pthread_create.c:679)
==6496== by 0x4C34BB7: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==6496== by 0x400A7F: main (bar.c:52)
==6496==
==6496== ---Thread-Announcement------------------------------------------
==6496==
==6496== Thread #2 was created
==6496== at 0x51643DE: clone (clone.S:74)
==6496== by 0x4E46149: create_thread (createthread.c:102)
==6496== by 0x4E47E83: pthread_create@@GLIBC_2.2.5 (pthread_create.c:679)
==6496== by 0x4C34BB7: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==6496== by 0x400A7F: main (bar.c:52)
==6496==
==6496== ----------------------------------------------------------------
==6496==
==6496== Possible data race during read of size 4 at 0x6010A0 by thread #3
==6496== Locks held: none
==6496== at 0x400A09: worker (bar.c:37)
==6496== by 0x4C34DB6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==6496== by 0x4E476B9: start_thread (pthread_create.c:333)
==6496==
==6496== This conflicts with a previous write of size 4 by thread #2
==6496== Locks held: none
==6496== at 0x400A12: worker (bar.c:37)
==6496== by 0x4C34DB6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==6496== by 0x4E476B9: start_thread (pthread_create.c:333)
==6496== Address 0x6010a0 is 0 bytes inside data symbol "global_var"
==6496==
==6496== ----------------------------------------------------------------
==6496==
==6496== Possible data race during write of size 4 at 0x6010A0 by thread #3
==6496== Locks held: none
==6496== at 0x400A12: worker (bar.c:37)
==6496== by 0x4C34DB6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==6496== by 0x4E476B9: start_thread (pthread_create.c:333)
==6496==
==6496== This conflicts with a previous write of size 4 by thread #2
==6496== Locks held: none
==6496== at 0x400A12: worker (bar.c:37)
==6496== by 0x4C34DB6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==6496== by 0x4E476B9: start_thread (pthread_create.c:333)
==6496== Address 0x6010a0 is 0 bytes inside data symbol "global_var"
==6496==
Final result in global=9
Number of debug records=9
5c27700 1592027935:822210625
5c27700 1592027935:822605329
6828700 1592027935:823178972
5c27700 1592027935:823200774
5c27700 1592027935:823276338
5c27700 1592027935:823405036
6828700 1592027935:823456170
5c27700 1592027935:823470497
5c27700 1592027935:823535862
==6496==
==6496== For counts of detected and suppressed errors, rerun with: -v
==6496== Use --history-level=approx or =none to gain increased speed, at
==6496== the cost of reduced accuracy of conflicting-access information
==6496== ERROR SUMMARY: 14 errors from 2 contexts (suppressed: 101 from 7)
For your program, you're going to get a LOT of output.
I think before you do too much actual code changing, you need to gather as much information as possible.
- probing the thread scope of each global variable.
- creating a design for what you have (warts and all).
- creating a design for what you want (as if you were starting from afresh).
- work out a planned transformation from where you are to where you want to be.
Bit of a long ramble I'm afraid.