1. Why though that happens? Why do n threads accumulate n ticks per time? Well, it makes sense in a way because the algorithm of calculating 10 sec is executed n times. Or am I missing something?

By the way, I execute your 10 sec algorithm. For 8 threads I get 80secs. For more than 8 threads though I get the same, 80 secs. I was thinking of dividing by the number of threads. But that is only useful if the threads are equal or less than the cores. I would like also to see what happens for like 200 threads. Can I use clock() for that?

2. clock() is defined to give you the CPU usage for the process. Each thread that is currently running at the given accounting moment (e.g. timer ticks) will contribute to the total time here. If you have 8 cores, the time you will consume in 8 concurrent threads that each use 10s of CPU time is 80 seconds. Of course, if you poll for ten seconds of "wall clock time" to pass, then 200 threads will also take 80 seconds of CPU time (or ever very close to that), since the actual time it takes is 10 seconds, and you will never have more than 8 threads running at any given time, so 80 seconds of CPU time is used up, in some distributed fashion between those threads (how it's distributed depends on the priority of those threads and the sheduler's choices as to which thread to run at what time and for how long).

That's why you need to account for both the CPU usage and the wall-time to determine what is the most efficient set of threads to solve a particular problem - although if the scheduler is efficient, you may not see much difference between 8 and say 100 threads (although the overall time and CPU time should be marginally more for the same amount of actual work with MANY threads compared to the same work with "ideal" number of threads - it may or may not be measurable at sane numbers of threads tho'). You can not measure this sort of thing using my "wait for x seconds to pass" method, since each thread will run for the same amount of wall-clock-time no matter how many threads you use.

--
Mats

3. Originally Posted by matsp
clock() is defined to give you the CPU usage for the process. Each thread that is currently running at the given accounting moment (e.g. timer ticks) will contribute to the total time here. If you have 8 cores, the time you will consume in 8 concurrent threads that each use 10s of CPU time is 80 seconds. Of course, if you poll for ten seconds of "wall clock time" to pass, then 200 threads will also take 80 seconds of CPU time (or ever very close to that), since the actual time it takes is 10 seconds, and you will never have more than 8 threads running at any given time, so 80 seconds of CPU time is used up, in some distributed fashion between those threads (how it's distributed depends on the priority of those threads and the sheduler's choices as to which thread to run at what time and for how long).

That's why you need to account for both the CPU usage and the wall-time to determine what is the most efficient set of threads to solve a particular problem - although if the scheduler is efficient, you may not see much difference between 8 and say 100 threads (although the overall time and CPU time should be marginally more for the same amount of actual work with MANY threads compared to the same work with "ideal" number of threads - it may or may not be measurable at sane numbers of threads tho'). You can not measure this sort of thing using my "wait for x seconds to pass" method, since each thread will run for the same amount of wall-clock-time no matter how many threads you use.

--
Mats
Thanks for the explanation. I generally have problems figuring out why 200 threads run faster than 8 threads. This had already been discussed in a previous thread I had created. I just hoped that there would be another way to measure time so I can verify my results.

One last question. Is there any optimization from the compiler (GCC) in what order the threads are executed? My thought is that the compiler re-orders the set of instructions so they are executed in an optimized way, since modern CPU's can execute more than one instructions the same time if the meet some requirements. I also assume that when one instruction loads data from memory another might execute at the same time, if again some requirements are met.
So, if you give two threads to execute the same time, would the compiler do something similar combining the instruction sets of the two threads? The part of codes that run asyncrhonous. Or does the compiler optimize the code for each thread and the scheduler just does the rest?