clock()

**C_ntua** · 10-07-2008

Why though that happens? Why do n threads accumulate n ticks per time? Well, it makes sense in a way because the algorithm of calculating 10 sec is executed n times. Or am I missing something?

By the way, I execute your 10 sec algorithm. For 8 threads I get 80secs. For more than 8 threads though I get the same, 80 secs. I was thinking of dividing by the number of threads. But that is only useful if the threads are equal or less than the cores. I would like also to see what happens for like 200 threads. Can I use clock() for that?

**matsp** · 10-07-2008

clock() is defined to give you the CPU usage for the process. Each thread that is currently running at the given accounting moment (e.g. timer ticks) will contribute to the total time here. If you have 8 cores, the time you will consume in 8 concurrent threads that each use 10s of CPU time is 80 seconds. Of course, if you poll for ten seconds of "wall clock time" to pass, then 200 threads will also take 80 seconds of CPU time (or ever very close to that), since the actual time it takes is 10 seconds, and you will never have more than 8 threads running at any given time, so 80 seconds of CPU time is used up, in some distributed fashion between those threads (how it's distributed depends on the priority of those threads and the sheduler's choices as to which thread to run at what time and for how long).

That's why you need to account for both the CPU usage and the wall-time to determine what is the most efficient set of threads to solve a particular problem - although if the scheduler is efficient, you may not see much difference between 8 and say 100 threads (although the overall time and CPU time should be marginally more for the same amount of actual work with MANY threads compared to the same work with "ideal" number of threads - it may or may not be measurable at sane numbers of threads tho'). You can not measure this sort of thing using my "wait for x seconds to pass" method, since each thread will run for the same amount of wall-clock-time no matter how many threads you use.

--
Mats

**C_ntua** · 10-07-2008

Originally Posted by matsp

clock() is defined to give you the CPU usage for the process. Each thread that is currently running at the given accounting moment (e.g. timer ticks) will contribute to the total time here. If you have 8 cores, the time you will consume in 8 concurrent threads that each use 10s of CPU time is 80 seconds. Of course, if you poll for ten seconds of "wall clock time" to pass, then 200 threads will also take 80 seconds of CPU time (or ever very close to that), since the actual time it takes is 10 seconds, and you will never have more than 8 threads running at any given time, so 80 seconds of CPU time is used up, in some distributed fashion between those threads (how it's distributed depends on the priority of those threads and the sheduler's choices as to which thread to run at what time and for how long).

That's why you need to account for both the CPU usage and the wall-time to determine what is the most efficient set of threads to solve a particular problem - although if the scheduler is efficient, you may not see much difference between 8 and say 100 threads (although the overall time and CPU time should be marginally more for the same amount of actual work with MANY threads compared to the same work with "ideal" number of threads - it may or may not be measurable at sane numbers of threads tho'). You can not measure this sort of thing using my "wait for x seconds to pass" method, since each thread will run for the same amount of wall-clock-time no matter how many threads you use.

--
Mats

Thanks for the explanation. I generally have problems figuring out why 200 threads run faster than 8 threads. This had already been discussed in a previous thread I had created. I just hoped that there would be another way to measure time so I can verify my results.

One last question. Is there any optimization from the compiler (GCC) in what order the threads are executed? My thought is that the compiler re-orders the set of instructions so they are executed in an optimized way, since modern CPU's can execute more than one instructions the same time if the meet some requirements. I also assume that when one instruction loads data from memory another might execute at the same time, if again some requirements are met.
So, if you give two threads to execute the same time, would the compiler do something similar combining the instruction sets of the two threads? The part of codes that run asyncrhonous. Or does the compiler optimize the code for each thread and the scheduler just does the rest?

**matsp** · 10-08-2008

The compiler will not know anything about the fact that your code runs as threads.
Also, as each thread runs on a separate core, the ordering of instructions between threads will not matter - the processor itself will re-order your instructions as it feels appropriate.

The compiler (given the right processor model options) will try to order instructions in the best order for the processor given, so it will for example put a "load" instruction two-three instructions before it uses the loaded value. Of course, that's not always possible, but if it has something else useful that it can put in there, then it will.

--
Mats

**C_ntua** · 10-08-2008

Well, if I run the program with more threads than cores then some cores will have more than one threads. But, from what you are saying there are no optimizations so the scheduler will just decide how threads are run.

Anyway, thanx for you help. I've run out of ideas how to figure this out

Thread: clock()

Thread Tools

Search Thread

Display

Similar Threads

Logical Error in Clock program

Outside influences on clock cycles? (clock_t)

Clock Troubles

clock program

using clock()