How to write actually good multithreaded code?

**Elysia** · 11-26-2013

You will get a slowdown. The "number of elements" is decided by the command line parameter.
It simply divides work on N threads, where N is decided by the number of processors in your system.

**MutantJohn** · 11-26-2013

Oh, you're right. My bad, I read the output from 'time' wrong XD

Edit :

I hope this doesn't sound too brown nosey, Elysia, but looking at your code is kind of a privilege to me. You're really good at C++. Like shoot, you're really good. I'm learning so much by looking at your code, thank you.

**cyberfish** · 11-26-2013

- You don't have control over cache (the hardware manages it, period)

We do have a little bit of control over cache, with prefetching instructions, as well as instructions that instruct the CPU to read from memory, but don't cache it (because you know it will only be needed one time), or only cache it in higher levels (because you know by the time it's needed again, it would have been flushed from L1 anyways even if you cached it there, etc, but still want it cached in L2).

GCC supports them through __builtin_prefetch, where the programmer would give "hints" on how much locality the data fetched has (will it be needed again later, etc), and GCC decides how to support that on the target.

Data Prefetch Support - GNU Project - Free Software Foundation (FSF)

**rcgldr** · 11-27-2013

Originally Posted by cyberfish

Data Prefetch Support - GNU Project - Free Software Foundation (FSF)

For X86 processors, it appears that those data prefetch options only apply to SSE / MMX instructions, not general purpose instructions that involve memory accesses.

**Elysia** · 11-27-2013

Originally Posted by MutantJohn

I hope this doesn't sound too brown nosey, Elysia, but looking at your code is kind of a privilege to me. You're really good at C++. Like shoot, you're really good. I'm learning so much by looking at your code, thank you.

You're welcome. Any time.

**cyberfish** · 11-27-2013

For X86 processors, it appears that those data prefetch options only apply to SSE / MMX instructions, not general purpose instructions that involve memory accesses.

The prefetch instructions are part of SSE / MMX, but the data prefetched can be used by any instruction.

**stevesmithx** · 11-27-2013

Originally Posted by MutantJohn

I hope this doesn't sound too brown nosey, Elysia, but looking at your code is kind of a privilege to me. You're really good at C++. Like shoot, you're really good. I'm learning so much by looking at your code, thank you.

You have done quite some damage there. This will be going in his/her signature for a long long time.

**rcgldr** · 11-27-2013

Originally Posted by cyberfish

The prefetch instructions are part of SSE / MMX, but the data prefetched can be used by any instruction.

OK, it wasn't clear to me that those settings affected all data accesses. There's still the issue of the translation look aside buffer / cache used to map virtual addresses into physical addresses for fetching data values that are not in one of the data caches.

**rcgldr** · 11-27-2013

Another issue is that a simple search may be memory bandwidth limited, regardless of the cache usage. In this case, sequential or near sequential accesses of RAM will be faster because of the RAS / CAS delay counts, and parallel but different access points into RAM would trigger the longer delay counts.

Thread: How to write actually good multithreaded code?

Thread Tools

Search Thread

Display

Similar Threads

What are you using to write your code?

MultiThreaded GUI.

How to write a program as good as possible?

any good way to write sprite/animation classes for a game?