Originally Posted by
C_ntua
matsp:
Well, you get a better result for 2 and 5 threads. A small differense though...
But, generally, if you may, try again without the z=z+1;
z=z+1 means that all threads try to write/read on the same memory location. So there would be a delay. So make z a local variable of function a if you want.
Also, see if you indeed get better results for 2, 5 threads, even slightly better.
Thanx a lot!
Yes, and the compiler (at least in release mode) will optimize the entire loop into nothing (empty loop -> not needed).
Reading and writing the same memory location in the same thread is not a problem - yes, it takes one extra clock-cycle or so. Of course, it doubles the time of the loop, since i++ and z++ (or z = z + 1) is going to take the amount of time that it takes to increment a variable, and the rest of the loop is a predictable branch.
For laughs, here's the same without:
Code:
E:\proj\threads\Debug>threads.exe 1
total time: 2.03270, sum=2.03257, avg=2.03257, max=2.03257, min=2.03257
E:\proj\threads\Debug>threads.exe 2
total time: 2.03495, sum=3.90264, avg=1.95132, max=1.95143, min=1.95121
E:\proj\threads\Debug>threads.exe 5
total time: 2.03316, sum=8.77349, avg=1.75470, max=1.93551, min=1.52791
E:\proj\threads\Debug>threads.exe 10
total time: 2.03305, sum=10.67696, avg=1.06770, max=1.62267, min=0.20301
E:\proj\threads\Debug>threads.exe 20
total time: 2.03670, sum=11.28223, avg=0.56411, max=1.22387, min=0.10130
// and release - not these values are not valid, as the loop is not iterating:
E:\proj\threads\Release>threads.exe 1
total time: 0.00011, sum=0.00000, avg=0.00000, max=0.00000, min=0.00000
E:\proj\threads\Release>threads.exe 2
total time: 0.00018, sum=0.00000, avg=0.00000, max=0.00000, min=0.00000
E:\proj\threads\Release>threads.exe 5
total time: 0.00041, sum=0.00000, avg=0.00000, max=0.00000, min=0.00000
E:\proj\threads\Release>threads.exe 10
total time: 0.00078, sum=0.00000, avg=0.00000, max=0.00000, min=0.00000
E:\proj\threads\Release>threads.exe 20
total time: 0.00158, sum=0.00000, avg=0.00000, max=0.00000, min=0.00000
I can say that the values vary a bit up and down, so I'm sure that this is part of the "2 and 5 is faster than 1". My processor is NOT capable of running multiple threads in parallel, and any attempt to run MT code on my processor will just slow things down. Evidently, not by a huge amount, since it's very little difference between the timing of the code in overall time.
Code:
E:\proj\threads\Debug>threads.exe 1000
total time: 3.06047, sum=2.98273, avg=0.00298, max=0.00393, min=0.00000
E:\proj\threads\Debug>threads.exe 1000
total time: 3.08970, sum=3.01170, avg=0.00301, max=0.00394, min=0.00000
E:\proj\threads\Debug>threads.exe 1000
total time: 3.08314, sum=3.00482, avg=0.00300, max=0.00394, min=0.00000
E:\proj\threads\Debug>threads.exe 1000
total time: 3.09533, sum=3.01738, avg=0.00302, max=0.00395, min=0.00288
Here's an example of four consecutive runs [and also showing that once you get LOTS of threads running, it takes longer - even if it's only in the few hundreds of a second range] - just to show that it's varying a fair bit.
I think the conclusion here is that the "work" in each thread far outweighs the actual time of thread creation, switching and destruction. So the test isn't particularly meaningfull.
--
Mats