Facebook invents radical new technology... a compiler

**MutantJohn** · 08-25-2015

Originally Posted by Codeplug

>> Fun details like when to garbage collect are left to the reader.
I take that back. I didn't read the claims.

Here is a typical atomic operation that may be used: https://en.wikipedia.org/wiki/Compare-and-swap

gg

Funny you mention that.

I'm looking at this : https://www.cs.cmu.edu/~410-s05/lect...1_LockFree.pdf

and they also use an atomicCAS() operation but it's in a do-while loop. How is a do-while loop more efficient than a locking mechanism? Sure, there's a larger storage overhead using a mutex per node but RAM is dirt cheap so it's almost a non-issue.

**Codeplug** · 08-25-2015

A full locking mechanism implies a system of wait queues to allow "blocking" of threads and "waking" of threads ... this is done in your operating system. So there is a context switch into the OS, and your thread being suspended by the OS while "blocking".

Compare that to these CPU instructions:

Code:

Do 
{
    cas(...)
} Until (<my cas() was successful>)

gg

**phantomotap** · 08-25-2015

Except that the linked-lists are concurrent capable.

O_o

The bloated language describing the sweep is exactly where I have a problem.

Sure, there's a larger storage overhead using a mutex per node but RAM is dirt cheap so it's almost a non-issue.

Do you think memory is the only resource?

Soma

**Mario F.** · 08-25-2015

I'm going to implement a generic statistical hypothesis test. And I have decided to implement one mutex per sample data point. With 8GB, 5 of which free, it will work great on a 1 million data point sample space. Whoosh! That fast.

**MutantJohn** · 08-25-2015

Originally Posted by Codeplug

A full locking mechanism implies a system of wait queues to allow "blocking" of threads and "waking" of threads ... this is done in your operating system. So there is a context switch into the OS, and your thread being suspended by the OS while "blocking".

Compare that to these CPU instructions:

Code:

Do 
{
    cas(...)
} Until (<my cas() was successful>)

gg

This is a good point. I like it.

Originally Posted by phantomotap

O_o

Do you think memory is the only resource?

Soma

No, but the most apparent overhead to me was memory. What are some others that you were thinking of? Was it what Codeplug already mentioned? I'm genuinely asking here.

Originally Posted by Mario F.

I'm going to implement a generic statistical hypothesis test. And I have decided to implement one mutex per sample data point. With 8GB, 5 of which free, it will work great on a 1 million data point sample space. Whoosh! That fast.

I feel like there's a lesson for me somewhere in here but I'm not sure... Can you expand?

**Mario F.** · 08-25-2015

Originally Posted by MutantJohn

I feel like there's a lesson for me somewhere in here but I'm not sure... Can you expand?

The only lesson is don't ever think of trade-offs as simple arithmetic. Also, in multithreading there is rarely an universal truth (well, perhaps except that threads are evil).

Instead, what I was trying to say (not as a lesson) is that if I implement such a system I would bog it down to a crawl, almost certainly even crash it.

It all depends on the implementation. But a mutex, to be effective, needs to check conditions, common implementations even raise exceptions. There's a considerable processing overhead to a mutex. There used to be an interesting chart listing some common mutex implementations with some comparative performance notes. Can't seem to find it anymore.

**MutantJohn** · 08-25-2015

Interesting... Now I must begin my search for a lock-free doubly linked list...

**Mario F.** · 08-25-2015

Originally Posted by MutantJohn

Interesting... Now I must begin my search for a lock-free doubly linked list...

Why would you? One of the arts of concurrency is to decide how you should partition your data and design your processes as to take the most advantage of what the OS and the PL offer you. And you are given a modest amount of information that allows you to make sensible choices. It is not about abandoning it because you want make 20 mutexes where 2 or 3 would suffice.

Whereas in comparison, implementing a lock-free doubly-linked list will have you spend the next two months on guess work and having little to no idea if whatever you chose in the end is actually the best you could do.

**MutantJohn** · 08-26-2015

Originally Posted by Mario F.

Why would you? One of the arts of concurrency is to decide how you should partition your data and design your processes as to take the most advantage of what the OS and the PL offer you. And you are given a modest amount of information that allows you to make sensible choices. It is not about abandoning it because you want make 20 mutexes where 2 or 3 would suffice.

Whereas in comparison, implementing a lock-free doubly-linked list will have you spend the next two months on guess work and having little to no idea if whatever you chose in the end is actually the best you could do.

This is a good point.

Would it instead be better to write the list in pages of memory? Something like,

Code:

std::vector< node_buffer > list_pages

? This way, we'd only need a number of mutexes equal to the number of pages or in this case,

Code:

list_pages.size()

.

Is this what Codeplug was alluding to in his post about bucket hashing?

**Alpo** · 08-27-2015

Facebook is horrible... Also in breaking news, "Is water wet?", find out what top scientists have to say on News At 9, here on KBSPL!

**Mario F.** · 08-27-2015

Originally Posted by MutantJohn

Would it instead be better to write the list in pages of memory? Something like,

Code:

std::vector< node_buffer > list_pages

? This way, we'd only need a number of mutexes equal to the number of pages or in this case,

Code:

list_pages.size()

.

I wouldn't know. That is not the point of multithreading. Multithreading is a code optimization technique. Nothing more.

Because of this, you don't want to generalize solutions. Any multithreading solution should be evaluated against your specific set of conditions. What is the expected density of additions to your list? What are the side effects? What is the nature of the data managed by your linked list? Under what conditions should your code increase or decrease the threads running on that linked list? This and many other questions you answer only on the field, with live code and experimentation.

You can certainly have an intuition of a best scenario. And try it out. It often ends up playing well. But if instead you try to come up with a generalization, you must know it will always fail on those edge and corner cases. And these are more common then you might think. A good rule of thumb is that any multithreaded code has usually a very low range of acceptable parameters.

And also because multithreading is just another optimization technique, you need to evaluate if the effort pays. That is, if you really stand anything to gain from an increase in performance or whether the measured increase in performance offers you an advantage to justify the a) work involved, b) increase in the code complexity and c) increase of the expected value of bugs in your code because of b.

One of the most damaging things you can do to your code is not evaluating it against your actual needs. I've seen (literally, not figuratively) people swoon in ecstasy over a 0.05 second gain per iteration. Only, their code is expected to run at most 10 iterations when the user clicks a button. That means, they worked for one or two full days on a multithreading problem, increased their code maintenance complexity, possibly introduced some new bugs, so that the end user gains 0.5 seconds after clicking a Save Document button.

On a blog I used to maintain a few years back, I once called them the "false prophets of performance". And these people are everywhere. They make a large chunk of the programming world and are responsible for the increased complexity of code. They respect the notion of too much optimization is evil, but forget threading is an optimization technique and only an optimization technique, easily falling prey to over-optimization or unnecessary optimization.

**Yarin** · 08-28-2015

Originally Posted by Alpo

Facebook is horrible... Also in breaking news, "Is water wet?", find out what top scientists have to say on News At 9, here on KBSPL!

Breaking news!

**MutantJohn** · 08-28-2015

Actually Mario, that's a good point. A multithreaded linked list is dumb. Linked lists typically work best when you're removing/inserting nodes constantly and they work even better when you use a pool allocator or rather something that will constantly reuse the same section of memory (so you're using placement new(), for example). It sounds like they're be a lot of contention between threads, even if you did use a lockless implementation. They also need to be small because doing a look-up is O( n ) without some form of hashing so for a large list, probing for an element is expensive.

Multithreading is fun but you're right, it's not always the solution. Hmm... Good perspective. Sometimes I get caught up in doing something just for the sake of doing it because I like parallelizing code.

Thread: Facebook invents radical new technology... a compiler

Thread Tools

Search Thread

Display

Similar Threads

borland c++ facebook api

facebook

Radical Sign

rochester institude of technology and wentworth institute of technology

radical education reform