I am working on a program that uses 1 or more threads to process a NxN matrix. when I run the program for a 50x50 matrix with 1 thread I get an execution time of about 13000ms. When I process the same matrix with 2 threads I am getting an execution time of about 20000ms.
Obviously there is something wrong because with 2 thread it should take about half the time, i.e. 6500ms.
I was able to narrow it down to the loop that iterate through each element of the array:
Code:
for (uint32_t i = args->low; i < args->high+1; i++) {
for (uint32_t j = 1; j < args->size+1; j++) {
args->write[i][j] = (args->read[i-1][j]+args->read[i+1][j]+args->read[i][j-1]+args->read[i][j+1])*.25;
dif = fabs(args->write[i][j]-args->read[i][j]);
if (dif > max_dif) max_dif = dif;
}
}
The way the threads work is that each thread gets assigned a set of rows to work on each thread passes over its section of the array several hundred times. For example, with two threads on a 50x50 matrix, thread 1 would get rows 0-25 and thread 2 would get 26-50. I have confirmed that each thread is being assigned the correct values for args->low and args->high so they are not doing duplicate work. So it makes sense that the program should execute in half the time... but it doesn't.
Looking at some debug output I was able to see that with two threads each iteration of the loop above was taking 30-70ms. After one thread completes each iteration for the remaining thread drops to about 30ms. And with 1 thread each iteration takes 40-80ms.
Does anyone know what is happening to cause this?
The processor on my system is an Intel i5 and the machine is almost completely idle without running my program. So there should be no hardware constraints on getting the expected improvement with 2 threads.