Parallel programming: how to tell threads are being scheduled across the cores?

**Angus** · 05-07-2009

I'm using threads for the first time to maximize usage of a multi-core processor. I'd like to know if all the cores are being used to their maximum potential, but I don't know how.
I was told that for a while MySQL had a flaw wherein it was not properly using the free Linux kernel to schedule its threads. However, I've never seen any calls that direct scheduling. I find this confusing, so another question I have is, is the onus on the programmer to cause threads to be scheduled on different cores?

**matsp** · 05-07-2009

Under normal circumstances, threads are scheduled on all threads.

--
Mats

**Angus** · 05-07-2009

Originally Posted by matsp

Under normal circumstances, threads are scheduled on all threads.
Mats

Threads are scheduled on all cores, you mean? So that's all there is to it? Does anyone know about this problem where the MySQL server didn't operate under normal circumstances? What about OpenMP? What's that for, if not normal circumstances?

**brewbuck** · 05-07-2009

Originally Posted by Angus

Threads are scheduled on all cores, you mean? So that's all there is to it? Does anyone know about this problem where the MySQL server didn't operate under normal circumstances? What about OpenMP? What's that for, if not normal circumstances?

The only reason I can think that MySQL might have anecdotally been doing such a thing would be if it was deliberately messing around with scheduling. You have to TRY to break it. By default, threads are intelligently scheduled across cores.

**matsp** · 05-07-2009

Yes, brewbuck explains much better than me - if you just create threads, all cores will be used (as long as the OS itself is working correctly, of course).

By "messing" with the OS's scheduling, you are HIGHLY likely to undo the work done to make the OS's scheduling optimal. There are some special cases when this is not the case, but those are about as common as the finding compiler bugs or OS bugs. If you do not KNOW FOR CERTAIN, then you most likely should not try to "outsmart" the OS

There are tools that can tell you what the CPU load is on each CPU. "ps" will be one of those, but other tools may be "better".

--
Mats

**Codeplug** · 05-07-2009

Some reading on the subject: Take charge of processor affinity

gg

**Elkvis** · 05-07-2009

I have a pair of Dell PowerEdge 6850 servers at work, and they each have four dual-core pentium-4 chips. each core allows hyperthreading, so I have a total of 16 logical cores per machine. I find that MySQL's InnoDB tables run much faster if I limit the engine to 3 cores. This was actually suggested by the techs at MySQL, when I had a support contract with them. It seems counter-intuitive, but I suspect that it has to do with file I/O and other hardware issues.

**stabu** · 05-19-2009

if you have the "top" command, type it and press "1"
this will give you individual load on your cores.

**Angus** · 05-20-2009

Interesting. I guess it takes some experience to know what those numbers mean. After all, it's not enough to know that all the cores are being used, but that they are all being used at the same time.

**matsp** · 05-20-2009

Originally Posted by Angus

Interesting. I guess it takes some experience to know what those numbers mean. After all, it's not enough to know that all the cores are being used, but that they are all being used at the same time.

If you see 100% load on all processor (or NEAR 100%), then that would indicate that all processors are being used.

Do you have any reason to believe that NOT all cores are being used?

--
Mats

**brewbuck** · 05-20-2009

Originally Posted by Elkvis

I have a pair of Dell PowerEdge 6850 servers at work, and they each have four dual-core pentium-4 chips. each core allows hyperthreading, so I have a total of 16 logical cores per machine. I find that MySQL's InnoDB tables run much faster if I limit the engine to 3 cores. This was actually suggested by the techs at MySQL, when I had a support contract with them. It seems counter-intuitive, but I suspect that it has to do with file I/O and other hardware issues.

Even modern operating systems aren't perfect at scheduling threads across cores. Threads might be jumping from core to core unnecessarily, which destroys CPU cache performance. Limiting the set of schedulable cores makes this jumping less likely, leading to better cache performance.

You could consider that a "bug" in the OS, but a general purpose scheduler will never do the "right thing" in all possible situations, even though they try... really hard. Take a look at a scheduler sometime

**Angus** · 05-21-2009

Originally Posted by matsp

If you see 100% load on all processor (or NEAR 100%), then that would indicate that all processors are being used.

You're quite sure about that? I have some critical questions about the way such measurements are taken. Is it a reflection of the true mean CPU usage over the sample time, or is that subject to the granularity of the tool? What if there's a spike in CPU usage when one sample is taken, then nothing for a while, then another spike when the next sample is taken?
Also, can someone answer the same question about the "perfmon" tool in WinXP? I'm also developing an app there which is real-time. It's using a DirectX sound buffer, which suffers from an underrun sometimes, and those times seem to coincide with other processes doing work, but I don't see much more than 20% cpu usage on perfmon.

Do you have any reason to believe that NOT all cores are being used?

Only insofar as I don't believe everything I read

I was given the impression that the open Linux kernel doesn't schedule threads very well, but folk around here seem confident of otherwise, so I guess I'll accept that for now

**matsp** · 05-21-2009

CPU performance is metered in the kernel in some way. The USUAL way to do this is by measuring time from "schedule in" to "schedule out" for each process. The accounting displayed by "top" or some such is the sum of the CPU measurement for all processes EXCEPT the "idle"/"null" process. It is sampled over a period of time (e.g. 1 second), so if it says 90%, 0.9 seconds of CPU time was used by all processes except IDLE for that 1 second period. The remaining 0.1 second was in idle.

Obviously, the measurment isn't necessarily extremely precise, so for example interrupt execution may count as the current running process, even if the interrupt work actually done was initiated by a different process (e.g. you have a web-server that queues up 1MB of HTML to send out, and then goes to sleep waiting for all that data to be sent off, and then a math process starts. Because we are using a rather inefficient network card, 3% of time is spent in the interrupt process, so whilst our math process indicates 99.5% CPU time usage, it actually used only 96.5%, because 3% was used by the network card, performing work for the web-server that is currently sleeping).

Little spurts of CPU (within the sample period, e.g. 1 second) will show up averaged over the time-period, so if a process runs constantly for 200ms in that 1 second interval, it will show up as 20% CPU usage [assuming nothing else runs in that same time].

As to scheduling in Linux, it will absolute certainly schedule things on all available processors UNLESS the process creating the thread goes out of it's way to PREVENT that (by setting the thread affinity). And whilst setting the thread affinity MAY help at times, it is most often NOT helpful. The MySQL example given is probably one of the few cases where this is beneificial, and the reason is that MySQL does a lot of locking and disk IO, whilst sharing some global data, so the different threads NEED to share the same data, and are often scheduled for short periods of time.

--
Mats

**Wraithan** · 05-22-2009

If you are considering scheduling your own and you get it to work REALLY well in your test environment you have to keep in mind your application isn't going to be the only one running and even if your app is rather resource intensive it may not be the most resource intensive so your scheduling will not only slow down your app (by trying to assign threads to full cores in some cases) but will slow down others that try to take advantage of the OS's built in scheduling.

Thread: Parallel programming: how to tell threads are being scheduled across the cores?

Thread Tools

Search Thread

Display

Parallel programming: how to tell threads are being scheduled across the cores?

Similar Threads

Multithreading - synchronisation and keeping track of running threads

Yet another n00b in pthreads ...

OpenMP parallel threads for matrix multiplier

problem with win32 threads

Block and wake up certain threads

Tags for this Thread